DEV Community: Joni Sar

Netflix Intelligent Lakehouse Solves Iceberg Maintenance — You Can Easily Too

Joni Sar — Fri, 29 May 2026 13:40:00 +0000

Every production Iceberg data lake eventually hits the same wall: tables that looked fast at 10 GB start crawling at 10 TB. Small files pile up from streaming ingestion, snapshots accumulate because nobody set expiration, orphaned data lingers from failed Spark jobs, and manifest lists grow until planning a simple SELECT takes longer than running it.

Netflix hit this wall years ago — and their solution shaped how the industry thinks about lakehouse architecture. At AWS re:Invent, their engineers walked through the ecosystem they assembled around Iceberg: Polaris for catalog management, Autotune for automated compaction, janitors for continuous cleanup, and Metacat for observability. The outcome was a 25% cost reduction and tables that stayed healthy without manual intervention.

But Netflix had something most teams don't: a dedicated platform organization building custom distributed services backed by CockroachDB, Kafka, and fleets of Spark clusters.

Today, a lakehouse control plane — just as good as Netflix's or better — is available for everyone to install on their Iceberg lakehouse.

The industry's favorite solution is LakeOps.

This article breaks down what actually makes a lakehouse "intelligent" — component by component — and shows how each piece maps to tooling that exists today.

Today with LakeOps, every team in the world is 10 minutes away from an intelligent lakehouse. And yes, it includes autonomous snapshot optimization as well as orphan files, metadata, manifests, and more.

The maintenance gap nobody talks about

Apache Iceberg solved the table format problem. Schema evolution, hidden partitioning, time travel, snapshot isolation — these features are why every major engine from Snowflake to DuckDB now speaks Iceberg natively.

What Iceberg intentionally left unsolved is who runs the maintenance. The format gives you powerful primitives. Keeping those primitives performing well at scale is your responsibility.

In practice, this creates a silent degradation cycle:

Streaming writes produce small files — a pipeline appending every 5 minutes to 100 partitions creates 100 new files per commit. After a week, some partitions contain thousands of sub-megabyte Parquet files.
Snapshots grow unbounded — without explicit expiration, every commit adds a snapshot. A table with hourly writes accumulates 8,760 snapshots per year, each referencing its own manifest list.
Orphan files accumulate — aborted Spark jobs, failed compaction runs, and expired snapshots leave behind data files that no snapshot references. These files cost storage but serve nothing.
Manifests fragment — as files are added and removed, the manifest layer becomes a web of small manifest files. Query planning reads every one of them before scanning a single data file.

The financial impact compounds from four directions: storage waste (orphans + snapshots), compute waste (scanning small files), metadata overhead (fragmented manifests), and engineering time (maintaining cron scripts that break silently).

Netflix's insight was that solving these problems one at a time, with isolated scripts, doesn't scale. You need an integrated system — a control plane that sees the full picture and acts on it continuously.

The six components of an intelligent lakehouse

Looking across Netflix's published architecture and detailed breakdowns of their Iceberg ecosystem, six capabilities separate an intelligent lakehouse from a collection of Iceberg tables with maintenance scripts taped to the side:

1. Universal catalog connectivity

Netflix built Polaris to replace the Hive Metastore with a catalog purpose-built for Iceberg — scalable, CockroachDB-backed, and supporting the Iceberg REST catalog specification for multi-engine access.

Most teams aren't replacing their catalog. They're running AWS Glue, or they adopted a REST catalog like Nessie or Lakekeeper early on, or they have tables spread across multiple catalogs in different regions.

An intelligent lakehouse connects to existing catalogs — Glue, DynamoDB, REST (Polaris, Nessie, Lakekeeper, Gravitino), S3 Tables, or custom implementations — discovers every namespace and table, and normalizes metadata into a single operational view. No catalog migration required.

LakeOps does exactly this: point it at your catalog credentials, and within minutes it inventories every table and starts collecting metadata signals.

2. Intelligent and efficient compaction that actually works

Netflix's Autotune watches for table write events through SQS and spins up Spark jobs to compact small files in the background. It's the core of their self-maintaining architecture.

The Spark-based approach works but carries significant overhead. You need provisioned compute clusters, JVM tuning, IAM roles for each cluster, and someone on-call for job failures. Spark compaction typically costs around $50 per TB processed.

A Rust-based alternative changes the economics entirely. LakeOps runs compaction with a native engine built on Apache DataFusion — no JVM, no cluster provisioning, no shuffle stages.

It reads Iceberg metadata, plans optimal merges, and writes compacted Parquet directly to your storage. Production benchmarks show roughly $5/TB — a 10x cost reduction over Spark.

On top, LakeOps runs compaction based on actual query patterns — so the way your files are organized is optimized to minimize I/O, cutting CPU costs as well as storage by up to 80% compared to Spark or S3 Tables.

It also coordinates all operations and events with Adaptive Maintenance to maximize results and cut time and costs. The sequence matters, and event- or trigger-driven ops are much smarter and more efficient than cron-based jobs.

Two strategies cover every workload:

Binpack — combines small files targeting optimal file sizes (~512 MB). Handles most tables well with minimal configuration.
Sort — reorders data by query-relevant columns so engines skip irrelevant row groups through predicate pushdown. Dramatic speedups for tables with clear access patterns.

Each table can be run manually first (configure → Execute → review results) and then switched to automated scheduling with a cron expression. No all-or-nothing commitment.

3. Metadata lifecycle automation

Netflix runs dedicated "janitor" services for orphan cleanup and snapshot expiration. Without them, their exabyte-scale lake would drown in stale metadata and unreferenced files.

The same operations — snapshot retention, orphan removal, manifest consolidation — need to run continuously on any production Iceberg lake. LakeOps provides all four as per-table operations with independent configuration:

Operation	What it does	Why it matters
Snapshot retention	Expires snapshots beyond a retention period, respecting min counts	Reclaims metadata, enables cleanup
Orphan file cleanup	Removes files unreferenced by any snapshot (with age threshold)	Recovers wasted storage
Manifest optimization	Consolidates fragmented manifests	Speeds up query planning
File compaction	Merges small files (Binpack or Sort)	Reduces scan overhead and S3 API costs

The execution order matters: expire first, then clean orphans, then compact, then consolidate manifests. Running them out of sequence wastes compute or risks removing files still in use.

When you want all four automated together, Adaptive Maintenance bundles them into a single data-driven policy that reacts to table activity — the closest equivalent to Netflix's integrated approach.

4. Full-stack observability without building a pipeline

Netflix's Metacat provides unified metadata access across all datasets, backed by Kafka event streams for real-time operational visibility. Building this took years and a dedicated team.

Out-of-the-box observability should include:

Table health classification — every table scored as Healthy, Warning, or Critical based on file counts, size distributions, snapshot accumulation, and metadata fragmentation
AI-generated insights — ranked recommendations that flag small-file hotspots, excessive snapshots, and missing retention before they become incidents
Event audit trail — every maintenance operation recorded with before/after metrics, timestamps, and status — per-table or lake-wide, filterable by catalog and operation type
Dashboard — total operations, query speed gains, cost savings, and resource reduction in a single view

And table health dashboards:

The difference from building it yourself is time-to-value: connect a catalog and immediately see what's degraded, what's wasting money, and what to fix first.

5. Policy-driven governance that scales with the lake

Configuring maintenance table-by-table stops working somewhere between 50 and 100 tables. Netflix needed organization-wide rules; so does every team that's past the proof-of-concept stage.

A policy engine lets you define maintenance rules at the catalog or namespace level — snapshot retention every hour, orphan cleanup daily, compaction at 2 AM — and every table in scope inherits them automatically. New tables that appear in a governed catalog get the right configuration without anyone touching them.

Two policy categories cover the ground:

Maintenance policies — schedule and configure any operation (or all of them via Adaptive Maintenance) across a scope
Configuration policies — enforce table settings like Iceberg format version, file format, and write distribution mode

Per-table overrides always take precedence, so you set sensible defaults broadly and customize only where needed.

6. Multi-engine query routing

Netflix connects engines through the REST catalog endpoint, but routing decisions — which engine handles which query — remain manual architecture choices in most organizations.

An intelligent routing layer dispatches queries to the best engine based on the workload:

Cost-optimized — sends queries to the cheapest engine that meets your latency SLA
Latency-optimized — picks the fastest engine for the query shape
Throughput-optimized — distributes load for maximum concurrency

Applications connect to a single SQL endpoint (Postgres wire, MySQL wire, or Arrow Flight). When an engine goes down, failover reroutes automatically. When you add or remove engines, application code doesn't change.

LakeOps handles this through QueryFlux, an open-source Rust SQL proxy that translates SQL dialects with sqlglot and adds ~0.35ms of overhead.

What this adds up to

Each component is useful independently. Together, they compound:

Storage costs drop 40–55% from continuous orphan removal, bounded snapshots, and compaction
Compute costs drop up to 75% from Rust-native compaction replacing Spark clusters, sort-order optimization reducing scan volume, and routing hitting the cheapest viable engine
Query latency improves up to 12x through optimized file sizes, sorted layouts, consolidated manifests, and Puffin column statistics
Engineering hours shift from maintaining scripts and debugging overnight failures to building data products

Beyond human users: AI agent access

The next layer of intelligence is enabling AI agents to interact with lakehouse data programmatically. LakeOps provides an MCP (Model Context Protocol) interface that gives agents structured access — table discovery, SQL execution through the routing layer, column statistics without scanning, and maintenance triggers — all within configurable guardrails.

You can enforce read-only access, row limits, PII masking, cost caps, and human approval per agent. As agent usage grows, their query telemetry feeds back into compaction decisions — tables agents query most get optimized first, with sort orders aligned to the predicates agents actually use. The lake self-optimizes as AI adoption scales.

Getting started

Netflix took years to build their intelligent lakehouse with dedicated teams.

The same architecture is now accessible in about ten minutes.

Visit lakeops.dev:

Connect your catalog — Glue, DynamoDB, REST, S3 Tables, or Custom. Every table is discovered automatically.
Optimize a few tables — run compaction or snapshot expiration manually, review results, then flip to automated scheduling.
Scale with policies — define rules at the catalog or namespace level. New tables inherit everything.
Monitor — the dashboard shows real-time impact, insights flag what needs attention, events provide the audit trail.

Your data never leaves your account. No agents to install, no pipelines to change, no infrastructure to provision.

The intelligent lakehouse is no longer reserved for companies that can build Netflix-scale infrastructure. The building blocks are here. The question is whether your tables are maintained — or quietly degrading while you read this.

Managed Iceberg: Optimizing a Modern Lakehouse

Joni Sar — Sun, 10 May 2026 13:52:17 +0000

A modern lakehouse looks simple from the outside.

Data lands in object storage. Apache Iceberg gives you tables, snapshots, schema evolution, time travel, and multi-engine access. Spark writes. Trino queries. Flink streams. Snowflake or Athena may read the same data. Everyone is happy.

Then the lakehouse starts growing.

Small files pile up. Snapshots never expire. Manifest metadata gets heavier. Delete files slow down reads. Failed jobs leave orphan files behind. Query planning becomes slower. Storage cost grows in places nobody is watching. Every engine has its own behavior, its own tuning, and its own operational gaps.

This is the part that gets underestimated.

Iceberg solves the table format problem. It does not magically solve lakehouse operations.

The same pattern already happened in compute infrastructure: once systems grew large enough, manual tuning stopped scaling and platforms like LakeOps became useful because they continuously optimized resources instead of relying on people to chase every inefficiency.

Iceberg lakehouses need the same shift. Not more scripts. Not more periodic cleanup jobs. A real control plane.

That is the idea behind LakeOps: autonomous lakehouse management for Apache Iceberg. It sits above your lake, watches table and engine behavior, and continuously manages the operational work that keeps Iceberg fast, clean, and cost-efficient.

This article is a practical guide to what that means.

The real job of running an Iceberg lakehouse

When teams first adopt Iceberg, the focus is usually on features.

ACID transactions on object storage.

Time travel.

Schema evolution.

Partition evolution.

Hidden partitioning.

Multiple engines reading the same tables.

Those are important. They are why Iceberg became popular.

But after the first production workloads move in, the job changes. You are no longer just “using Iceberg.” You are operating a lakehouse.

That means you are responsible for table layout, file size, metadata growth, snapshot retention, stale data, query planning, engine behavior, storage waste, and workload safety.

A healthy lakehouse needs constant maintenance.

A production Iceberg table is not static. Every ingest, append, merge, delete, compaction, schema change, or streaming write changes its physical and metadata shape. Even when the logical table looks clean, the underlying table may be drifting.

That drift is the problem.

A table can still return correct results while becoming slower and more expensive every week.

Where lakehouse maintenance gets painful

The pain usually appears in a few predictable places.

Small files are the first one. Streaming jobs, CDC pipelines, frequent appends, and micro-batches create many small files. Query engines then spend too much time opening files, planning splits, and scanning inefficiently. A table that should be read in seconds starts feeling heavy.

Snapshots are the second. Iceberg creates snapshots so readers can see consistent table versions and users can time travel or roll back. That is useful, but old snapshots accumulate unless someone expires them. Over time, they keep metadata and old data references alive.

Manifests are the third. Iceberg tracks data files through metadata files. That is what makes Iceberg reliable and engine-independent, but metadata also needs maintenance. If manifests grow or fragment, query planning slows down before the engine even starts scanning data.

Orphan files are the fourth. Failed writes, aborted jobs, migrations, dropped tables, and imperfect cleanup flows can leave files in object storage that are no longer referenced by Iceberg metadata. Queries do not read them, but storage still bills for them.

Delete files are another common issue, especially with merge-on-read workloads. If they accumulate, every query may pay the cost of applying deletes at read time.

Then there is the engine layer. Spark, Trino, Flink, Athena, Snowflake, Databricks, DuckDB, and other engines do not behave the same way. They have different cost models, latency profiles, concurrency limits, and operational strengths. Managing Iceberg across engines is not the same as managing one Spark pipeline.

This is why “we have Iceberg” is not the same as “we have a managed lakehouse.”

The manual way most teams start

Most teams start with scripts.

A Spark job for compaction.

A scheduled job for snapshot expiration.

A cleanup script for orphan files.

A few dashboards.

Some alerts.

A runbook in a wiki.

A Slack channel where someone asks, “Why is this table slow again?”

There is nothing wrong with this as a starting point. It is how most platforms mature.

The problem is that static maintenance does not understand table behavior.

A daily compaction job does not know whether a table had a quiet day or a massive ingestion spike.

A weekly snapshot cleanup job does not know whether the table has long-running readers, branch retention requirements, or compliance rules.

A manifest rewrite schedule does not know which tables are suffering from planning latency.

A generic orphan cleanup script may be too conservative to reclaim meaningful storage or too aggressive to be safe.

And none of this naturally connects to cost, query performance, engine behavior, or table-level business importance.

Manual maintenance works when you have a small number of tables and a small number of workloads. At lakehouse scale, it becomes operational debt.

What managed Iceberg means

Managed Iceberg means the maintenance loop becomes part of the platform.

Not a one-off script.

Not a quarterly cleanup project.

Not a few jobs someone hopes are still running.

A managed Iceberg layer continuously observes the lakehouse, decides what needs attention, runs the right operation, and records the result.

It should manage the core lifecycle of Iceberg tables:

Compaction.

Snapshot expiration.

Manifest optimization.

Orphan file cleanup.

Delete file handling.

Statistics and metadata optimization.

Table health monitoring.

Policy enforcement.

Engine visibility.

Cost and performance tracking.

The key point is that management should be table-aware and workload-aware.

A hot BI table is not the same as a streaming staging table. A CDC table is not the same as a cold archive table. A table queried by Trino all day is not the same as a table used by a nightly Spark job. A table exposed to AI agents has different risk and cost patterns than an internal batch table.

A managed lakehouse should understand those differences.

The control plane model

A lakehouse control plane is the layer that coordinates operations across storage, Iceberg metadata, catalogs, engines, policies, and observability.

and

It does not replace Iceberg.

It does not replace your object storage.

It does not force all teams into one query engine.

It gives you one operating layer for the lakehouse.

LakeOps describes this as a control plane for your data lake: end-to-end optimization for tables and metadata across storage and query engines, with telemetry-driven orchestration and visibility in one place.

That distinction matters.

The goal is not to make Iceberg proprietary. The goal is to make open lakehouse operations manageable.

A good control plane should answer questions like:

Which tables are unhealthy?

Which tables are wasting the most storage?

Which tables have the worst small-file problem?

Which tables have metadata planning issues?

Which tables should be compacted now?

Which tables should not be touched because active workloads are running?

Which compaction strategy should be used?

Which snapshots can safely expire?

Which files are safe to delete?

Which engine is best for this workload?

Did the optimization actually improve cost or performance?

If you cannot answer these questions quickly, the lakehouse is being managed manually, even if it has automation scripts.

Solving the small-file problem

Small files are the most visible Iceberg maintenance issue.

They usually come from streaming ingestion, frequent appends, CDC, micro-batches, partition skew, and multi-writer workloads. The result is predictable: more files, more metadata, more object-store requests, more planning work, and slower queries.

The normal fix is compaction.

But not all compaction is equal.

The simple version is bin-packing: combine many small files into fewer larger files. This is often the right first step because it quickly reduces file count and improves scan efficiency.

The more advanced version is sort-based compaction: rewrite files according to the columns that queries filter or join on most often. This can improve data skipping and reduce scanned data, but it is more workload-sensitive. Sorting everything blindly can waste compute.

This is where autonomous management becomes useful.

LakeOps includes compaction for Apache Iceberg that uses table metadata and query patterns to decide which files to rewrite and how. The useful part is not only that it compacts. The useful part is that compaction becomes part of a continuous feedback loop.

A practical operating model looks like this:

Start with bin-pack compaction on tables with severe small-file pressure.

Use query-aware sort compaction only where query patterns justify it.

Avoid compacting cold tables just because a schedule says so.

Prioritize tables where compaction will reduce real query cost or latency.

Track before-and-after impact: file count, data scanned, planning time, runtime, and cost.

That is the difference between maintenance and optimization.

Managing snapshots safely

Snapshots are one of the best things about Iceberg.

They enable time travel, rollback, auditability, and consistent reads. But they also create retention work.

Every write creates a new table version. If snapshots are never expired, metadata grows and old data can remain retained longer than needed. On busy tables, this becomes a real cost and performance issue.

The hard part is not running expire_snapshots.

The hard part is knowing the right policy.

Some tables need long time-travel windows because they support audits, debugging, or recovery. Some tables only need short retention. Some tables may have branches or tags that must be protected. Some workloads may have long-running readers. Some environments need different rules for production, staging, and development.

A managed layer should make this explicit.

LakeOps provides snapshot management as part of table optimization, so retention can be controlled through policies rather than remembered manually per table.

For platform teams, this is a major shift.

Instead of asking, “Did someone remember to clean up snapshots on this table?”

You define retention behavior once, apply it at the right scope, and let the platform enforce it continuously.

A good default might be:

Keep enough snapshots for rollback and debugging.

Retain a minimum number of recent snapshots.

Use longer retention for critical regulated tables.

Use shorter retention for temporary or staging data.

Monitor how much storage is blocked by old snapshots.

Run expiration before orphan cleanup.

The exact values depend on the organization. The important thing is that snapshot retention becomes intentional.

Keeping metadata lean with manifest optimization

Iceberg query performance is not only about data files.

It is also about planning.

Before an engine scans data, it reads Iceberg metadata to understand which files belong to the snapshot and which files can be skipped. Manifest files are part of this metadata layer. They are essential, but they can also become fragmented over time.

When manifests grow poorly, planning time grows. Users experience this as “the query is slow,” but the engine may be spending too much time before meaningful scanning even begins.

This is easy to miss if you only look at execution time.

A managed Iceberg system should monitor metadata health directly.

LakeOps includes manifest optimization so teams can consolidate and optimize metadata as part of the same table health loop.

The principle is simple: metadata is part of performance.

If you only compact data files but ignore manifests, you are only managing half the table.

Cleaning orphan files without breaking things

Orphan files are a storage leak.

They sit in object storage but are not referenced by Iceberg metadata. Queries do not use them, but the cloud provider still charges for them.

They can appear after failed jobs, aborted commits, manual migrations, dropped tables, incorrect cleanup flows, or maintenance operations that leave old data behind.

The dangerous part is cleanup.

Deleting files from a data lake is easy. Deleting the right files safely is hard.

A safe orphan cleanup process must compare files in storage against Iceberg metadata, apply a conservative age threshold, avoid active write windows, and usually run after snapshot expiration. The age threshold matters because a file that looks unreferenced during an in-progress write may still be committed later.

LakeOps documents orphan file cleanup as a managed operation with metadata awareness and safety controls.

In practice, this is one of the strongest arguments for a control plane.

Nobody wants platform engineers manually reviewing millions of object-store paths. Nobody wants an unsafe script deleting files from production. And nobody wants to keep paying for dead data because cleanup feels risky.

Managed orphan cleanup should be boring, visible, and conservative.

Run a dry run.

Show candidates.

Apply retention thresholds.

Delete only when safe.

Record what was removed.

Measure storage reclaimed.

That is how cleanup becomes an operational capability instead of a dangerous maintenance task.

Policies are what make this scale

Manual tuning does not scale across hundreds or thousands of tables.

You need policies.

Policies let you define how table maintenance should behave at different scopes: organization, catalog, namespace, table, environment, or workload class.

For example:

Production BI tables get frequent compaction and manifest optimization.

Streaming tables get aggressive small-file management.

CDC tables get delete-file-aware compaction.

Staging tables get short snapshot retention.

Archive tables get minimal compute-heavy optimization but regular storage cleanup.

Critical tables require approvals or simulations before major rewrites.

Development tables get cheaper, more aggressive cleanup.

LakeOps supports policies for maintenance automation, allowing teams to define behavior for compaction, snapshots, manifests, orphan cleanup, and governance across the lakehouse.

This is important because the platform team should not be in the business of hand-tuning every table forever.

Good policies give teams defaults, guardrails, and exceptions.

That is how Iceberg operations become manageable.

Observability turns maintenance into engineering

If maintenance runs but nobody can measure the effect, it is not really managed.

A lakehouse control plane needs observability at the table, engine, and operation level.

You need to see:

Table health.

File count.

Average file size.

Small-file pressure.

Snapshot count.

Manifest count.

Delete file pressure.

Storage waste.

Query latency.

Planning time.

Data scanned.

Engine cost.

Operation history.

Before-and-after optimization impact.

Failed or skipped operations.

Policy coverage.

LakeOps includes lakehouse observability so platform teams can see table health, engine metrics, cross-system telemetry, and maintenance history from one place.

This changes how you operate.

Instead of waiting for users to complain that dashboards are slow, you can see which tables are drifting.

Instead of guessing whether compaction helped, you can measure the before and after.

Instead of discovering storage waste in a cloud bill, you can identify stale data and orphan files directly.

Good observability turns Iceberg maintenance from reactive firefighting into normal platform engineering.

Multi-engine lakehouses need engine-aware management

The whole point of Iceberg is that many engines can work over the same tables.

That is also what makes operations harder.

Spark may be good for heavy rewrites.

Trino may be better for interactive analytics.

Athena may be useful for serverless access.

Snowflake may serve BI workloads.

Flink may write continuously.

DuckDB may support local or embedded analytics.

Each engine has a different performance model. Each workload has a different latency and cost profile.

A managed lakehouse should not pretend all engines are the same.

LakeOps supports engine management and query routing, giving teams a unified view of engine health, cost, usage, and routing behavior.

This matters because optimization is not only about the table. It is also about where and how workloads run.

Sometimes the best optimization is a better file layout.

Sometimes it is a better engine choice.

Sometimes it is avoiding an expensive engine for simple queries.

Sometimes it is routing a workload away from an unhealthy engine.

A modern lakehouse control plane should see the full system, not just the storage layer.

Why continuous optimization beats scheduled maintenance

The old model is schedule-based.

Run compaction every night.

Expire snapshots every Sunday.

Clean orphan files once a month.

Rewrite manifests when someone remembers.

That is better than nothing, but it is not how real workloads behave.

A high-volume table may need attention multiple times a day.

A cold table may not need compaction for months.

A table may become hot because a new dashboard launched.

A backfill may create temporary file pressure.

A failed ingestion job may create orphan files.

A new AI agent may generate many new query patterns.

A static schedule cannot react to that.

Continuous optimization uses telemetry to decide what should happen next.

That is the core value of autonomous lakehouse management.

The lakehouse is not optimized because a cron job ran. It is optimized because the platform understands table state, workload behavior, and cost impact.

This is where LakeOps is useful in practice. It continuously analyzes the lakehouse, recommends or runs the right operation, and keeps optimizing as workloads change.

For a platform team, this reduces the amount of manual judgment required for routine operations.

You still set policies.

You still define guardrails.

You still decide what level of autonomy is acceptable.

But you are no longer manually chasing every unhealthy table.

A practical rollout plan

The best way to adopt managed Iceberg is not to turn everything on everywhere.

Start with visibility.

Connect the catalogs and engines. Let the platform observe table health, file layout, metadata, snapshots, and query behavior. Identify the worst tables by cost, latency, file count, metadata weight, and storage waste.

Then start with a small set of high-impact tables.

Good candidates are usually:

Large tables with many small files.

Hot BI tables with growing latency.

Streaming or CDC tables with constant writes.

Tables with high object-store request cost.

Tables with many snapshots.

Tables where users already complain about performance.

Apply conservative policies first.

Use bin-pack compaction before sort compaction.

Use dry runs for cleanup operations.

Set safe snapshot retention.

Run orphan cleanup with conservative age thresholds.

Measure everything.

Only after you see stable improvements should you expand to more tables, more aggressive compaction, sort optimization, and autonomous mode.

The point is not to give control away. The point is to move from manual table-by-table work to policy-driven operations with visibility.

What LakeOps solves, problem by problem

If you maintain Iceberg yourself, you eventually build pieces of a control plane internally.

You build table health checks.

You build compaction jobs.

You build snapshot cleanup.

You build orphan cleanup.

You build dashboards.

You build job orchestration.

You build policy conventions.

You build alerts.

You build runbooks.

You build engine-specific scripts.

You build cost reports.

Then you maintain all of that.

LakeOps packages that operating layer into one platform.

For small files, it provides autonomous compaction and layout optimization.

For slow queries, it optimizes file sizes, sort layout, manifests, and routing decisions.

For snapshot bloat, it manages retention policies.

For orphan files, it performs safe metadata-aware cleanup.

For fragmented metadata, it rewrites and optimizes manifests.

For multi-engine complexity, it gives one view of engines and can route workloads based on cost, latency, or throughput.

For operational visibility, it surfaces table health, engine metrics, events, recommendations, and optimization history.

For governance, it gives policies, auditability, and controlled automation.

For adoption risk, it works with the existing lakehouse stack instead of requiring pipeline rewrites or data movement.

That last point is important.

A control plane should reduce operational burden without becoming a migration project.

What to keep managing yourself

Autonomous management does not mean the platform team disappears.

You still own architecture.

You still own data modeling.

You still decide retention requirements.

You still define governance boundaries.

You still choose which engines belong in the platform.

You still control policies and exceptions.

You still review critical workloads.

The difference is where your time goes.

Instead of manually compacting tables, you define compaction policies.

Instead of hunting stale files, you monitor cleanup impact.

Instead of guessing why queries slowed down, you inspect table and engine telemetry.

Instead of writing one-off Spark jobs, you operate the lakehouse as a managed platform.

That is a better use of senior data platform engineering time.

The benefits beyond cost and performance

Cost and performance are the obvious wins.

Fewer small files means less scan overhead.

Cleaner metadata means faster planning.

Expired snapshots and orphan cleanup reduce storage waste.

Better layout reduces data scanned.

Better routing reduces unnecessary compute.

But there are other benefits that matter just as much.

Reliability improves because maintenance is consistent instead of ad hoc.

Governance improves because policies are explicit.

Debugging improves because every operation is visible.

Onboarding improves because new tables inherit sane defaults.

Security improves when access and actions are auditable.

Capacity planning improves because table growth and engine behavior are observable.

AI readiness improves because agents query cleaner, faster, better-governed tables.

Team focus improves because engineers stop spending so much time on repetitive maintenance.

These benefits compound.

A lakehouse that is continuously maintained becomes easier to trust.

Common mistakes when managing Iceberg manually

The first mistake is treating compaction as the whole problem. Compaction is important, but it does not replace snapshot expiration, manifest optimization, orphan cleanup, delete-file handling, or observability.

The second mistake is applying the same policy to every table. A staging table and a production revenue table should not have the same retention and optimization strategy.

The third mistake is running cleanup without a safety model. Orphan cleanup especially needs conservative thresholds and visibility.

The fourth mistake is ignoring metadata. Data files get attention because they are visible, but manifests and snapshots often explain planning latency and storage drift.

The fifth mistake is optimizing for one engine while the table is used by many engines.

The sixth mistake is not measuring impact. If you cannot show what changed after maintenance, you cannot tune the lakehouse intelligently.

The seventh mistake is waiting for incidents. Iceberg degradation is often gradual. By the time users complain, the table may have been unhealthy for weeks.

The target operating model

A modern Iceberg lakehouse should operate like this:

Tables are continuously monitored.

Health is measured at the file, metadata, snapshot, storage, and query level.

Policies define maintenance behavior.

Compaction runs when it has measurable value.

Snapshot expiration follows retention rules.

Orphan cleanup is safe and auditable.

Manifest optimization keeps planning fast.

Engine behavior is visible.

Query routing can account for cost and latency.

Optimization history is recorded.

Engineers can override, approve, or inspect operations.

The system improves continuously.

That is managed Iceberg.

Not a hosted table format.

Not a black box.

Not a replacement for engineering judgment.

A control plane that takes the repetitive, error-prone, high-volume operational work and turns it into policy-driven automation.

Final thoughts

Iceberg is a strong foundation for the modern lakehouse, but it is not the whole platform.

Once Iceberg becomes production infrastructure, the work shifts from “how do we create tables?” to “how do we keep hundreds or thousands of tables healthy while many engines and workloads use them?”

That is where many teams feel the pain.

Manual maintenance is fine at the beginning. Scripts are fine at the beginning. But as the lakehouse grows, entropy wins unless something is continuously managing the system.

Managed Iceberg is the next layer.

It means compaction, snapshots, manifests, orphan files, engines, policies, observability, and cost optimization are handled as one operating system for the lakehouse.

LakeOps is built around that idea: autonomous lakehouse management for Apache Iceberg, running on top of the stack teams already use.

For data platform engineers, the value is simple.

You keep Iceberg open.

You keep your storage.

You keep your engines.

You keep control.

But you stop managing the lakehouse one table, one script, and one incident at a time. Thanks for reading! :)

Managed Iceberg Data Lakes: A Guide

Joni Sar — Thu, 07 May 2026 16:47:06 +0000

Apache Iceberg has become the default table format for open data lakes. The 2025 State of the Apache Iceberg Ecosystem survey found 96.4% Spark adoption, 60.7% Trino, and growing DuckDB and Flink usage. Ryft's 2026 enterprise study reports that 58% of organizations now use Iceberg for business-critical analytics, and 79% plan to move their remaining data to it within 12 months.

Adoption is no longer the question. The question is: who maintains all of this?

Iceberg gives you snapshot isolation, schema evolution, hidden partitioning, and time travel. It does not give you someone to compact your files, expire your snapshots, clean up orphans, rewrite your manifests, or tell you which of your 800 tables is about to make your morning dashboards unusable. That is your job — and at scale, it is a job that breaks.

This guide covers what it actually takes to run an Iceberg data lake in production: the maintenance operations, the failure modes, and how LakeOps — an autonomous control plane for Apache Iceberg — addresses each of them.

What "managed" means in the Iceberg context

The word "managed" gets overloaded. In the Iceberg world, it refers to the ongoing operational work required to keep tables healthy after data is written. Iceberg handles transactional correctness — atomic commits, snapshot isolation, optimistic concurrency control. But it intentionally leaves maintenance to the operator. The format's spec defines what snapshots, manifests, and data files are. It does not define when they should be cleaned up, how files should be reorganized, or which tables need attention first.

A managed Iceberg data lake is one where these operations are handled continuously, automatically, and with awareness of the broader system — not just individual tables.

There are roughly six categories of maintenance work:

Compaction — merging small files into optimally-sized ones
Snapshot lifecycle management — expiring old snapshots and reclaiming storage
Manifest maintenance — rewriting manifests, consolidating position deletes, refreshing statistics
Orphan file cleanup — removing data files that are no longer referenced by any snapshot
Observability — knowing which tables are healthy, which are degrading, and why
Policy and governance — enforcing consistent maintenance rules across catalogs, namespaces, and teams

Each of these is straightforward in isolation. The difficulty is doing all of them, for every table, at the right frequency, in the right order, without breaking anything. Most teams start with custom Spark scripts scheduled in Airflow. That works at 10 tables. At 200 tables across multiple catalogs and engines, the scripts become the problem — brittle, uncoordinated, and blind to the interactions between operations.

LakeOps treats these six categories as a single coordinated system. You connect your catalog and storage — typically in under 10 minutes, with no agents installed, no data movement, and no pipeline changes — and the platform continuously manages all six across your entire fleet.

The small file problem and why compaction matters

Every Iceberg write produces new data files. Streaming pipelines that commit every few seconds can generate thousands of files per hour. Batch jobs with high partition cardinality scatter data across many small files. Even well-designed pipelines accumulate file fragmentation over time as tables evolve.

The consequences compound:

Query planning overhead: the query engine must open and parse every manifest entry to build a scan plan. More files means more manifest entries means slower planning. A table with 100,000+ files can push planning time from milliseconds to tens of seconds before a single byte of data is read.
I/O amplification: each file requires a separate object storage GET request. Object stores are optimized for throughput on large sequential reads, not for opening thousands of small files. A query that should scan 10 GB across 20 files instead scans the same 10 GB across 2,000 files — same data volume, dramatically worse latency.
Metadata bloat: more files means larger manifests. Larger manifests means more data for the coordinator to load into memory. At extreme scale, this can cause OOM failures during query planning.

The standard solution is compaction: periodically rewriting groups of small files into fewer, larger files targeting 256–512 MB each. Iceberg supports this natively through the RewriteDataFiles action in Spark, or through engine-specific SQL commands. But how you compact — the engine, the strategy, the awareness of actual query patterns — makes a dramatic difference.

How LakeOps handles compaction

LakeOps replaces Spark-based compaction with a purpose-built Rust engine built on Apache Arrow and DataFusion. The difference is architectural: no JVM, no garbage collection pauses, bounded memory, vectorized execution. In production benchmarks on a 5.5 TB dataset across 10 tables, the engine achieved up to 99.8% file reduction and peak throughput of 2,522 MB/s — completing jobs that caused Spark OOM on identical hardware.

But raw speed is only part of it. LakeOps compaction is query-aware: it analyzes actual query patterns against each table to determine which file groups to prioritize and which sort orders will produce the best scan pruning. This means compaction cycles directly reduce I/O for the queries your team actually runs, rather than blindly reorganizing data no one reads.

Two strategies are available:

Binpack consolidates small files into optimally-sized files (~512 MB) without changing sort order. It is the default, and it resolves the small file problem with minimal compute.
Sort compaction rewrites data in a column order derived from query filter and join patterns — enabling Iceberg's min/max metadata pruning to skip entire files. On TPC-H benchmarks, sorted layouts reduced bytes scanned by 51% with an additional ~9% compression improvement.

LakeOps also supports branch-based simulations: you can test a layout change on an Iceberg branch, see the projected impact on scan volume and query latency, and promote only if the results justify it — without touching the production snapshot.

Snapshot lifecycle management

Every Iceberg commit creates a new snapshot. Snapshots enable time travel, but they are not free:

Each snapshot references a manifest list, which references manifests, which reference data files. Old snapshots keep references alive to data files that may have been logically deleted or replaced by compaction.
A table with 120 days of hourly commits accumulates ~2,880 snapshots. Each snapshot's metadata must be tracked, and the data files it references cannot be garbage collected until the snapshot is expired.
In production, unexpired snapshots are one of the largest sources of storage waste. Teams routinely discover that 30–50% of their object storage bill is data referenced only by old snapshots that should have been expired weeks ago.

Snapshot expiration removes old snapshots and makes their exclusively-referenced data files eligible for deletion. The parameters are straightforward — retention window, minimum snapshot count, schedule — but the execution at fleet scale is where things break down. Different tables need different retention windows. Streaming tables commit far more frequently than batch tables. And expiration must happen before orphan cleanup to avoid deleting files still referenced by unexpired snapshots.

LakeOps automates snapshot lifecycle per table, respecting configurable retention policies at every scope level. You set the rules once — the platform handles scheduling, sequencing, and cleanup continuously. It also provides a snapshot explorer in the UI: browse snapshots, compare states, tag versions, and roll back — with full visibility into how much storage each retention window is costing you.

Manifest maintenance

Manifests are Iceberg's indexing layer. Each manifest file tracks a subset of data files along with their partition values, column-level min/max statistics, and file sizes. Over time, they accumulate problems that silently degrade query performance.

Manifest fragmentation: as tables evolve through many small commits, manifests proliferate. A table with 500+ manifests forces the query planner to open and parse each one during scan planning. Rewriting manifests consolidates them — reducing the number of files the planner must read and cutting planning latency.

Position delete files: Iceberg v2 supports row-level deletes via position delete files (merge-on-read). At query time, the engine reads both data files and their associated delete files, filtering out deleted rows. As delete files accumulate, read amplification grows. Rewriting position deletes physically merges them into the data files, converting merge-on-read to copy-on-write and eliminating per-query overhead.

Puffin statistics: Puffin files store column-level statistics (NDV, histograms, null counts) that enable cost-based query optimization. Stale Puffin stats lead to suboptimal join ordering and filter selectivity estimates. They need to be refreshed after significant data changes.

These operations interact with each other and with compaction. Manifest rewrites should happen after compaction (since compaction changes the file layout). Position delete merges should be coordinated with compaction to avoid redundant rewrites. Puffin refresh should follow any significant layout change.

LakeOps treats all of these as coordinated operations within a single autonomous maintenance loop. The platform understands the dependency graph — expire snapshots first, then clean orphans, then compact, then rewrite manifests and refresh statistics — and executes the full sequence at the right cadence for each table based on its ingestion rate and query patterns. No manual scheduling, no sequencing bugs, no forgotten tables.

Orphan file cleanup

Orphan files are data files that exist in storage but are not referenced by any current snapshot or metadata file. They accumulate from failed writes, completed compaction (old files remain until explicitly deleted), snapshot expiration (references removed, files left behind), and table drops or partition deletes.

Left unchecked, orphan files consume enormous storage. Production lakes regularly accumulate hundreds of terabytes of dead data — files that cost money every month but serve no purpose. One team discovered ~200 TB of orphan files across their lake, costing roughly $4,000/month in pure storage waste.

Cleanup involves listing all files in the table's storage location, comparing against all file references in current metadata, and deleting files that are unreferenced and older than a configurable age threshold. The age threshold is critical — set it shorter than your longest-running write job and you risk deleting files that are part of an in-progress transaction.

LakeOps runs orphan cleanup as part of its coordinated maintenance sequence, after snapshot expiration has properly released references. The age threshold is configurable per policy (default 7 days), and the platform ensures cleanup never runs ahead of expiration. For teams that have never cleaned orphans, the first run often reclaims a surprising amount of storage — one documented case saw a 350 TB lake shrink to 230 TB in 10 minutes.

Observability: knowing what is broken before users notice

The most dangerous state for a data lake is one where tables are degrading but no one knows. Queries get slower by 5% per week. Storage costs creep up. A table that used to plan in 200ms now takes 4 seconds. No single commit is the cause — it is the accumulation of hundreds of small writes without maintenance.

Effective Iceberg observability requires monitoring at multiple levels:

Table-level health metrics: file count, average file size, manifest count, snapshot count and age, position delete file count, orphan file volume. A table with 50,000 files averaging 2 MB is in trouble. A table with 500+ manifests has a planning bottleneck. These signals are available via Iceberg metadata tables ($files, $snapshots, $manifests), but querying them across hundreds of tables and correlating the results into actionable classifications is a significant engineering effort on its own.

Query-level signals: planning time vs. execution time (if planning exceeds 30% of total query time, metadata or file count is the bottleneck), data scanned vs. data returned (high ratios mean poor file layout), and engine-specific metrics like Trino split counts or Spark task counts.

Fleet-level views: you need to see across all tables in all catalogs to identify systemic patterns — which namespaces have the most critical tables, whether streaming tables degrade faster than batch tables, and where storage cost is concentrating.

LakeOps provides this observability out of the box. Every table gets a health classification (critical, warning, healthy) based on file count, file size distribution, manifest fragmentation, snapshot depth, and orphan accumulation. The platform dashboard shows fleet-wide health, engine metrics, cost signals, and query performance — correlated across engines so you see the full picture rather than stitching together engine-specific monitoring. Insights surface specific issues (e.g., "12 tables have small-file hotspots," "3 tables have snapshot bloat") with severity and recommended action, so you know exactly where your lake needs attention and what to do about it.

Policy and governance at fleet scale

When you have 10 tables, you configure maintenance manually. When you have 800 tables across 5 catalogs, you need policies.

A policy defines what maintenance operations should run, on which scope, at what schedule, with what parameters. Policies should cascade — an organization-wide default applies everywhere, but a namespace-level policy can override it for streaming tables that need more aggressive compaction, and a table-level policy can override that for a specific high-priority table.

Common policy dimensions:

Compaction: target file size, strategy (binpack/sort), schedule, concurrency limits
Snapshot expiration: retention period, minimum snapshots to retain, schedule
Manifest rewrite: schedule, trigger threshold (e.g., rewrite when manifest count exceeds 50)
Orphan cleanup: age threshold, schedule, scope
Table configuration standards: enforce Iceberg v2, default Parquet compression, write distribution modes, commit retry settings

LakeOps implements a hierarchical policy engine with four scopes: table → namespace → catalog → organization. More specific policies override broader ones. You define the rules once, and the platform enforces them continuously across your entire fleet. Every action is logged, auditable, and reversible — which matters when you are explaining to compliance why a table's data was reorganized at 2 AM.

For reference, a reasonable manual scheduling baseline looks like this:

Operation	Schedule	Rationale
Snapshot expiration	Hourly	High-frequency commits need frequent cleanup
Compaction	Daily, 2 AM	Off-peak, after expiration has freed references
Orphan cleanup	Daily, 3 AM	After expiration, before compaction
Manifest rewrite	Daily, 4 AM	After compaction produces new file layout

With LakeOps, you do not need to manage these schedules yourself — the platform's autonomous loop handles sequencing and adapts frequency to each table's ingestion rate — but the table above is useful for understanding the dependency order.

Multi-engine considerations

One of Iceberg's core value propositions is engine independence. A single table can be read and written by Spark, queried by Trino, accessed by Snowflake's external tables, and analyzed by DuckDB — all through the same catalog.

This creates two challenges: maintenance operations must be compatible with all engines accessing the table, and routing decisions affect both cost and performance.

Catalog as the coordination point

The Iceberg REST Catalog API has emerged as the standard for multi-engine access. LakeOps connects to your existing catalog — AWS Glue, Apache Polaris, Gravitino, Nessie, Lakekeeper, or REST-compatible catalogs — and operates through it. All maintenance commits go through the same catalog path your query engines use, ensuring atomic commits and proper concurrency handling. No separate catalog, no sidecar processes, no data movement.

Query routing

With multiple engines available, the question becomes: which engine should run which query? A small lookup query costs $0.01 on DuckDB and $0.08 on Snowflake. A complex analytical join runs in 2 seconds on Trino and 45 seconds on DuckDB. Sending every query to the same engine wastes money or wastes time.

LakeOps includes a multi-engine query routing layer that classifies queries and routes them to the optimal engine based on cost, latency, or throughput targets. Routing groups present a single SQL endpoint to downstream consumers (via PostgreSQL, MySQL, or Arrow Flight wire protocols) while the platform handles backend engine selection. Applications, BI tools, and agents connect to one endpoint — the routing layer optimizes which engine actually executes each query, and adapts as table health and engine load change.

Agentic AI and the data lake

A newer challenge: AI agents that issue SQL queries autonomously. LLM-powered agents, feature store pipelines, and autonomous data workflows increasingly query Iceberg tables directly — often at unpredictable volumes, with unpredictable query shapes, and without the human judgment that would avoid scanning a 500 TB table with SELECT *.

This changes the maintenance equation:

Latency sensitivity increases: agents expect sub-second responses. A table with a 4-second planning time because of small file accumulation is unusable for agent workflows, even if it is "fine" for batch analytics. Benchmarks show uncompacted tables impose a 5–10x latency penalty on agent queries.
Query volume increases: agents can issue 50 queries per interaction, multiplied across thousands of concurrent agents. Poorly laid-out tables amplify costs multiplicatively.
Safety becomes critical: an agent with write access to production tables can issue DDL, trigger full table scans on petabyte tables, or expose PII. Without guardrails, autonomous SQL access is a liability.

LakeOps addresses this through a dedicated agentic AI enablement layer. Agents connect via Model Context Protocol (MCP) with structured tools for schema discovery, query execution, and statistics retrieval — no raw JDBC connections. A layered guardrail chain enforces read-only access, row limits, cost estimation (via EXPLAIN), PII masking, and optional human approval for high-risk operations. The routing layer classifies agent queries and sends them to the right engine automatically. And critically, the maintenance system monitors agent query patterns and adjusts sort orders, compaction priority, and file sizes accordingly — so the lake self-optimizes for the workloads actually hitting it.

Common anti-patterns

Running compaction without expiring snapshots first. Compaction creates new files and commits a new snapshot, but the old files remain referenced by old snapshots. If you compact aggressively without expiring, you increase storage consumption — you now have both the old files and the new compacted files.

Setting orphan cleanup age threshold too low. If your longest-running write job takes 3 hours, and you clean orphans older than 1 hour, you will delete files that are part of an in-progress transaction. Set the threshold to at least 2x your longest write job.

Compacting everything on the same schedule. Not all tables need the same compaction frequency. A streaming table committing every 10 seconds needs hourly compaction. A daily batch table needs weekly compaction. One-size-fits-all scheduling either wastes compute on cold tables or under-maintains hot tables. LakeOps adapts compaction frequency per table based on ingestion rate and health signals, eliminating this tradeoff entirely.

Ignoring manifest count. Teams focus on file count and miss manifest fragmentation. A table with 500 manifests and well-sized data files will still have slow query planning because the planner must parse every manifest. Manifest rewriting is cheap — do it regularly.

Sort compaction without query analysis. Sorting by arbitrary columns wastes compute and can actually make queries slower if the sort order does not match dominant filter patterns. Always analyze query patterns before choosing sort columns — or let a system that observes actual query traffic make the decision for you.

No observability before automation. Automating maintenance without visibility into table health is flying blind. You will not know if your automation is effective, if it is missing tables, or if it is making things worse. This is why LakeOps starts with observability — classifying every table's health before taking any action.

Getting started

If you are standing up or improving a managed Iceberg data lake, here is a practical path:

Audit your current state. Query the $files, $snapshots, and $manifests metadata tables for your most important tables. How many files? What is the average file size? How many snapshots? This gives you a baseline — and usually a few surprises.
Connect LakeOps to your catalog. The platform connects to your existing catalog and storage in under 10 minutes — no agents, no data movement, no pipeline changes. Once connected, it automatically classifies every table as healthy, warning, or critical, giving you immediate visibility into your fleet's health.
Let autonomous maintenance run. LakeOps handles snapshot expiration, orphan cleanup, compaction, and manifest maintenance in the correct sequence, at the right frequency for each table. Start in observation mode if you want to review actions before they execute, then switch to full autopilot once you are confident.
Define policies for your fleet. Set organization-wide defaults for retention windows, compaction targets, and cleanup thresholds. Override at the namespace or table level where specific workloads need different treatment.
Enable query routing. If you run multiple engines, configure routing groups to optimize cost and latency across your engine mix. Present a single endpoint to consumers and let the routing layer handle engine selection.
Evaluate sort compaction selectively. For your most-queried tables, use LakeOps simulations to test sort strategies against actual query patterns. Promote layouts that show significant scan reduction.
Plan for agents. If AI agents will query your lake, enable the MCP interface and guardrail chain. Monitor agent workload patterns and let the platform optimize table layouts accordingly.

Closing

The work of managing an Iceberg data lake is not glamorous. It is compaction schedules, orphan cleanup thresholds, manifest fragmentation, and snapshot retention policies. It is the kind of work that, when done well, is invisible — queries are fast, storage is efficient, and engineers are not firefighting.

When done poorly — or not done at all — it is the reason your dashboards are slow, your cloud bill is unexplainable, and your platform team spends their time writing one-off maintenance scripts instead of building the things that actually matter.

Iceberg gives you a powerful foundation. LakeOps is what turns that foundation into a production-grade data platform — autonomously managed, fully observable, and ready for whatever workloads come next.

Introducing QueryFlux: Open-Source Universal Multi-Engine Query Router and SQL Proxy

Joni Sar — Mon, 06 Apr 2026 09:07:13 +0000

Efficiently routing multiple query engines is a critical challenge.

QueryFlux is a universal SQL proxy and multi-engine query router written in Rust. It sits between clients and query engines. Clients connect to QueryFlux using a protocol they already know. QueryFlux routes each query to the right backend, translates SQL dialects when needed, enforces concurrency limits, and gives you a unified observability surface.

Open table formats unified the data. QueryFlux unifies the access.

If you already run more than one query engine, you know the problem is not only where data lives. The harder part is how query access works in practice.

Which engine should this run on?
Which client should connect where?
How do you protect low-latency traffic from batch workloads?
What happens when one cluster is saturated?
How much routing logic ends up hardcoded across the stack?

That is the problem QueryFlux is built to solve.

Why QueryFlux exists

Modern data platforms are multi-engine by design.

A team may use Trino for federated queries, DuckDB for embedded analytics, StarRocks for low-latency serving, and Athena for pay-per-scan workloads on cold data. That mix is not a sign of architectural drift. In many cases, it is the right shape of the system.

Open table formats made this possible. With Apache Iceberg, Delta Lake, or Hudi, multiple engines can read the same data in object storage without duplicating it. That solved storage interoperability.

What it did not solve is compute access.

Each engine still comes with its own protocol, its own SQL dialect, its own connection handling, and its own operational behavior. Clients still need to know where to connect. Routing logic still leaks into notebooks, applications, dashboards, and team conventions. Capacity management is still fragmented across backends.

QueryFlux adds the missing layer above the table format: one access layer in front of the engine fleet.

What QueryFlux does

At a high level, QueryFlux handles three things:

protocol ingestion
routing
dispatch and dialect translation

Clients connect using protocols they already speak. QueryFlux supports:

Trino HTTP
PostgreSQL wire
MySQL wire
Arrow Flight SQL
Admin REST API

On the backend side, it already supports:

Trino
DuckDB
StarRocks
Athena

That gives it a very specific place in the stack. It is not trying to replace engines, and it is not introducing a custom client model. It is making a heterogeneous engine fleet look coherent from the access layer.

How a query flows through the system

A client connects to QueryFlux using a native protocol.

The query is evaluated against an ordered routing chain.

The first matching rule selects the cluster group that should handle the query.

From there, QueryFlux selects a healthy cluster in that group, optionally rewrites the SQL into the target dialect using sqlglot, and dispatches the query.

If the group is already at its concurrency limit, the query can queue at the proxy instead of failing immediately.

That is the important design move. QueryFlux is not just a forwarder. It is the runtime layer where access, routing, translation, and capacity handling meet.

Architecture

Client (psql / Trino CLI / mysql / BI tool)
    │
    │ native protocol
    ▼
┌─────────────────────────────────────────────┐
│                 QueryFlux                   │
│                                             │
│  Frontend ──► Router ──► Dialect translation│
│                    │                        │
│              Cluster group                  │
│         (concurrency limit + queue)         │
└──────────────────┬──────────────────────────┘
                   │
      ┌────────────┼────────────┐
      ▼            ▼            ▼
   Trino       StarRocks      Athena
      └────────────┴────────────┘
          Apache Iceberg / Delta / Hudi

The architecture is simple enough to understand quickly, but deep enough to be useful in real environments.

The simplicity is at the edge. Clients keep using the protocols they already know.

The depth is inside the routing and dispatch path, where QueryFlux can apply routing policy, translation, concurrency limits, queueing, health-aware selection, and load balancing without pushing that complexity back into every client.

Routing is where the value becomes obvious

QueryFlux evaluates each query against an ordered router chain.

Routing can be based on:

protocol
HTTP headers
SQL text using regex
client tags
Python script logic
compound rules
fallback routing

That matters because real routing logic is rarely a single condition. In practice, you may want to steer PostgreSQL wire traffic to a low-latency group, send ETL-tagged traffic to a batch-oriented cluster, and use query patterns to catch common fast-path cases.

A simple example looks like this:

routes:
  - name: fast_queries
    match:
      query_regex: "SELECT .* LIMIT \\d+"
    target: duckdb_group

  - name: dashboard_queries
    match:
      protocol: mysql
    target: starrocks_group

  - name: heavy_analytics
    match:
      query_regex: "JOIN|GROUP BY|WINDOW"
    target: trino_group

  - name: fallback
    fallback: true
    target: athena_group

The point is not that every deployment should use this exact policy. The point is that the policy becomes explicit, traceable, and shared.

That alone removes a surprising amount of hidden operational drag.

Cluster groups make routing operational

Once a route resolves to a cluster group, QueryFlux handles execution there.

It supports these load-balancing strategies:

roundRobin
leastLoaded
failover
engineAffinity
weighted

It also supports:

per-group concurrency limits
proxy-side queueing when groups are full
health-aware cluster selection
background health checks

This is where QueryFlux starts to feel deeper than a typical proxy.

It is not only deciding where a query should go. It is also giving operators a place to control how traffic behaves when systems are under load, how overflow is absorbed, and how healthy capacity is chosen.

That is the part that makes the system practical.

SQL translation is built into the path

Multi-engine routing is much more useful when SQL dialect differences do not immediately get in the way.

QueryFlux integrates dialect-only translation through sqlglot. When needed, it can rewrite SQL into the target engine’s dialect during dispatch.

That means:

clients can keep speaking the SQL they naturally emit
QueryFlux can normalize for the backend that will actually execute the query
teams do not need to maintain multiple versions of the same query only because engines differ

The current design is disciplined here. What is implemented today is dialect-only translation. Schema-aware translation is explicitly on the roadmap.

That is a good balance: the system is already useful now, and the path to deeper translation is clear.

Observability is part of the product, not an add-on

A routing layer only works if operators can see what it is doing.

QueryFlux includes:

Prometheus metrics
Grafana dashboard
Admin REST API
QueryFlux Studio

The current observability surface covers:

query counts
query duration
translation metrics
running queries
queued queries

It also supports routing traces, which matters in practice. When you introduce a routing layer, one of the first questions engineers ask is: why did this query land there? QueryFlux has a real answer to that.

Where this becomes useful quickly

The value of QueryFlux is easier to see in real scenarios than in abstract feature lists.

A multi-engine platform

BI tools connect through one access layer
different workloads are routed to the engines they fit best
backend topology becomes configuration instead of client code

Dashboard SLA protection

low-latency groups can be protected with concurrency limits
overflow can queue or spill instead of degrading the serving path

Incremental engine migration

weighted routing makes gradual traffic shifts possible
clients do not need to change while the migration happens

Mixed workloads on shared data

batch, interactive, and exploratory traffic can be separated by policy
routing intent lives in one place instead of being spread across the stack

These are practical benefits. They show up immediately once a platform becomes multi-engine.

Getting started is intentionally simple

One of the nice things about the project is that the first run experience is straightforward.

A minimal setup looks like this:

git clone https://github.com/lakeops-org/queryflux.git
cd queryflux/examples/minimal-trino
docker compose up -d --wait

That gives you:

QueryFlux on http://localhost:8080
Trino direct on http://localhost:8081
Admin API on http://localhost:9000
Studio on http://localhost:3000
Postgres on localhost:5433

You can then send a simple query through the Trino HTTP frontend:

curl -X POST http://localhost:8080/v1/statement \
  -H "X-Trino-User: dev" \
  -d "SELECT 42"

There are also examples for:

a minimal in-memory setup
a Prometheus + Grafana stack
a full stack with Trino, StarRocks, and Iceberg-related services

That combination is important. The system is conceptually ambitious, but the on-ramp is short.

It feels like deep infrastructure without feeling heavy to try.

What is already shipped

QueryFlux already includes a substantial set of capabilities on the main branch:

Trino HTTP frontend
PostgreSQL wire frontend
MySQL wire frontend
Arrow Flight SQL frontend
Admin REST API
Trino backend
DuckDB backend
StarRocks backend
Athena backend
ordered router chains and routing fallback
route tracing support
per-group concurrency limits
proxy-side queueing
multiple load-balancing strategies
health-aware cluster selection
dialect-only translation through sqlglot
in-memory persistence
PostgreSQL persistence
authentication providers including none, static, OIDC, and LDAP
authorization modes including allow-all, simple policy, and OpenFGA
Prometheus metrics
Grafana dashboard
QueryFlux Studio
dynamic config reload from Postgres

This matters because the project already feels like infrastructure, not just an idea.

What comes next

The roadmap extends the same core design.

Near-term work includes:

schema-aware SQL translation
ClickHouse backend and HTTP frontend
richer routing telemetry in Studio

Medium-term work includes:

cost- and performance-aware routing
Snowflake backend
BigQuery backend
Redis persistence

That roadmap makes sense. It deepens the same access layer instead of changing the project’s center of gravity.

Solving the data ccess side

The most interesting thing about QueryFlux is not that it is a proxy.

It is that it is a carefully placed layer in a part of the modern data stack that is still surprisingly underbuilt.

Open table formats solved the data side.

QueryFlux is solving the access side.

That creates an appealing combination:

conceptually clean architecture
obvious operational benefits
room for sophisticated policy and routing logic
low-friction adoption path

It feels like the kind of infrastructure that becomes more valuable as the rest of the stack becomes more heterogeneous.

Getting started

Once a data platform becomes multi-engine, the missing piece is usually not another engine.

It is the access layer.

Clients still need to know where to connect. Routing still leaks into tools and applications. SQL dialect differences still show up at the edges. Capacity handling is still fragmented.

QueryFlux gives that layer a shape.

It makes multi-engine access easier to reason about, easier to operate, and easier to evolve.

That is why it is a compelling project: the idea is deep, the benefits are immediate, and the first experience is simple.

To try it out visit: https://queryflux.dev/

11 Compaction Optimizations for Iceberg Data Lakes

Joni Sar — Mon, 16 Feb 2026 12:55:54 +0000

Compaction should provide an easy solution to a very difficult problem: controlling file count, minimizing the cost of delete operations over read costs, and keeping metadata growth from turning every query plan into a deep, time-consuming walk through snapshots and manifests.

The data layer has compaction as its mechanism for solving these issues, but only if compaction is run using some defined set of rules governing the scope of compactions, the thresholds above which compactions are run, and the synchronization of compaction runs with the timing of snapshot expirations and the maintenance of manifests.

Compaction can be run manually via scripts and schedules, or automatically by a control plane.

When manual scripts are used to run compaction, they can effectively manage compaction for a small number of tables and one engine. However, as soon as there are many tables, and/or multiple engines, the manual process becomes guesswork; scripts may rewrite too much, may run too infrequently, may interfere with ongoing ingestions, and will cause churn in both the snapshots and manifests.

A control plane flips this model completely around.

Instead of rewriting everything all the time, a control plane continuously monitors the health and workload characteristics of tables. Then, only when necessary, a control plane spends rewrite budget on the parts of the tables that actually change performance or cost, while also managing the entire lifecycle of maintaining the table.

This article will teach you how to run compaction like the production lakes do it: how to choose your base line strategy (bin-packing vs sorting) for compaction, how to prevent rewrites of healthy partitioning, how to limit the scope of each compaction so maintenance remains invisible, how to focus on hot and delete-heavy areas first, how to prevent the continuous commit cadence of streaming data from creating a factory of snapshot partitions, and how to synchronize compaction with the metadata cleanup that is needed to maintain stable query planning.

For further reading on other aspects of optimizing and maintaining Iceberg, please also refer to:

Let's move on to the compaction strategies that actually work in real production lakes.

1. Add a Control Plane for 20x Faster Compaction and Optimized Table Maintenance

If you add a control plane to your lake, LakeOps comes with the most powerful and intelligent compaction engine that exists today, and will also manage and optimize table maintenance and lake operations for you.

Snowflake-like experience for Iceberg with 10x performance (source: lakeops.dev)

Instead of being confined to fixed schedules, LakeOps operates as a control plane for Iceberg tables and can know when and how to compact what. It treats compaction as a continuous operational problem rather than a periodic batch job, and optimizes it in real time.

It analyzes telemetry data from query engines and Iceberg catalogs and uses that data to decide when compaction is actually needed, what to compact, and how. It takes actual usage patterns into account as well.

Under the hood, LakeOps uses a dedicated Rust-based compaction engine that is designed specifically for Iceberg layouts and metadata behavior. Compaction is coordinated with snapshot expiration, manifest rewrites, orphan cleanup, and statistics maintenance so these operations reinforce each other instead of fighting.

The results are ~20x faster compaction, ~15x faster queries, and ~80% CPU/Storage cost saving.

🚢 Apache Iceberg compaction is not “background maintenance.” It’s a time-critical optimization problem that directly impacts query latency, metadata growth, and infrastructure cost.

Learn more about it here:

Apache Iceberg Compaction: Time-Critical Optimization | Amit Gilad

In addition to compaction LakeOps gives you control with manual and autopilot modes for all maintenance operations in youe tables and coordinates them with compaction. That includes expiring snapshots, manifest rewrites, orphan file cleanups and more.

Real-Time comoaction and maintenance optimization with a control plane (source: lakeops.dev/)

You can choose between manual mode and auto-pilot per table or for groups of tables to control compaction and maintenance proccesses.

LakeOps also lets you define policies across the lake to enforce your standards, and provides you with dashboards to see and manage all compaction and maintenance processes per table and for the entire lake.

Learn more: https://lakeops.dev

2. Use bin-pack as the baseline correction

Most iceberg tables do not require complex layout schemes. However, they all suffer from file fragmentation.

Before attempting to fix the problem using sort-based layouts, clustering, or partitioning, take a closer look at the most obvious source of file fragmentation: the writing process itself. In almost all cases, the initial performance decline caused by file fragmentation is due to the streaming ingestion of very small batches; micro-batch commits occur very frequently; and backfill data will always come in very unevenly-sized chunks.

As a result of the write process, many small Parquet files exist within each partition. None of these files are "broken," and queries will still continue to provide accurate answers. However, as more and more small Parquet files exist within each partition, planning time will increase, the overhead associated with task scheduling will grow, and the number of object store calls will grow.

This is not a layout issue; it is a file count issue.

The easiest and most reliable method to solve the file count issue is to use bin-pack compaction. Bin-pack compaction combines small data files into a smaller number of larger, properly sized files. Bin-pack compaction does not alter the existing file sort order, nor does it re-cluster data; it merely normalizes the file size and decreases the metadata overhead associated with having a high number of files.

In practice, this is usually sufficient.

Why This Approach to Compaction Really Does Improve Performance

Iceberg engines operate at the file level. The more files you add to an engine, the more the engine must plan:

More manifest entries must be read

More file footers must be inspected

More scan tasks must be scheduled

More file references to delete must be tracked

As the number of files grows exponentially, so too does the planning time. Bin-pack compaction eliminates the number of files physically on disk, while maintaining the existing logical layout. Therefore, there are fewer planning reads, and fewer tasks to schedule without requiring additional shuffling.

A good rule of thumb for most production tables is to target file sizes ranging from 128 MB to 512 MB. The specific size range will depend on the engine, and the workload. What is important is consistency.

Start with the Default Rewriting Method

Unless you specify otherwise, Iceberg uses the bin-pack rewriting method:

CALL prod.system.rewrite_data_files( 
  table => 'db.events'
);

Verify the effectiveness of the rewrite:

SELECT
  count(*) AS file_count, 
  sum(file_size_in_bytes)/1024/1024/1024 AS total_size_gb
FROM prod.db.events.files;

You want to see fewer files and the same total size. If the total size is substantially changed, something other than file fragmentation is occurring.

Specify Your Target File Size

If file fragmentation continues, specify a target file size at the table level:

ALTER TABLE prod.db.events SET TBLPROPERTIES ( 
  'write.target-file-size-bytes'='536870912' -- 512MB
);

And then perform the rewrite with the same target file size:

CALL prod.system.rewrite_data_files( 
  table => 'db.events', 
  options => map( 
    'target-file-size-bytes','536870912' 
  ) 
);

Without specifying a target file size, the engine and writer will create files of varying sizes, which will require subsequent compactions to restore the target file size.

Define Thresholds to Prevent Unnecessary Rewrites

At scale, performing rewrites on healthy data wastes compute resources. Define the following thresholds to prevent unnecessary rewrites:

CALL prod.system.rewrite_data_files( 
  table => 'db.events', 
  options => map( 
    'min-input-files','5', 
    'min-file-size-bytes','134217728' -- 128MB 
  ) 
);

This will ensure that only those partitions with a minimum of five input files are rewritten. Partitions with one or two files of reasonable sizes are ignored.

These thresholds can be made tighter by defining multiple conditions:

CALL prod.system.rewrite_data_files( 
  table => 'db.events', 
  options => map( 
    'min-input-files','5', 
    'min-file-size-bytes','134217728', -- 128MB 
    'target-file-size-bytes','536870912' -- 512MB 
  ) 
);

Therefore, compaction will now only run under the following conditions:

There are enough small files to warrant consolidation

Files are currently below a reasonable size threshold

There is a clear target to normalize to

This transforms compaction from an automated rewrite operation to a targeted repair operation.

Prior to executing a compaction operation, review the distribution of files:

SELECT 
  partition.event_date, 
  count(*) AS file_count, 
  avg(file_size_in_bytes)/1024/1024 AS avg_mb 
FROM prod.db.events.files 
GROUP BY partition.event_date 
ORDER BY file_count DESC 
LIMIT 20;

If a partition contains four files averaging 480 MB and your target file size is 512 MB, rewriting the partition will not significantly affect either planning time or scan time.

However, if another partition contains 180 files averaging 25 MB, that partition clearly requires compaction.

Decisions regarding compaction operations should be based on this type of signal. Not a schedule.

3. Conditional Compaction

Another way to waste compute cycles in an Iceberg lake is to blindly compact data on a regular basis.

It usually begins innocently. A periodic rewrite job is created to "keep things tidy." For a period of time, it appears to help. Eventually, however, it begins to rewrite partitions that were previously healthy. Each rewrite generates new files, new snapshots, and updates the manifest list. Although none of the files are "broken," the system is spending compute cycles on processing data that does not improve performance.

Compaction should be used as a corrective measure, not as a routine activity.

Why Unconditional Rewrites Cause Churn

Each time you rewrite data files, Iceberg:

Creates new data files

Generates a new snapshot

Updates manifest lists

Increases metadata history

If the files being rewritten are already close to their target size, you are essentially cycling the data through the system. The added churn causes increased metadata depth and longer planning times over time.

At scale, this overhead becomes noticeable.

Your objective is not to compact frequently. It is to compact when the layout is measurably unhealthy.

Implement Gateways to Control Rewrites

Iceberg provides a rewrite_data_files procedure that allows you to implement gateway conditions. The most effective condition is min_input_files.

CALL prod.system.rewrite_data_files( 
  table => 'db.events', 
  options => map( 
    'min-input-files','3' 
  ) 
);

Using this condition, Iceberg will only rewrite file groups that contain at least three files. Partitions that already have one or two files that are of reasonable size are excluded.

This is a relatively small change to make, but in large lakes, it will significantly reduce unnecessary compaction.

You can implement additional conditions to make this gateway even tighter:

CALL prod.system.rewrite_data_files( 
  table => 'db.events', 
  options => map( 
    'min-input-files','5', 
    'min-file-size-bytes','134217728', -- 128MB 
    'target-file-size-bytes','536870912' -- 512MB 
  ) 
);

Compaction will now only run when the following conditions are met:

There are sufficient small files to warrant consolidation

Files are currently below a reasonable size threshold

There is a valid target to normalize to

This converts compaction from an automatic rewrite into a targeted repair operation.

Let the Table State Drive the Decision

Prior to initiating a compaction operation, review the distribution of files:

SELECT 
  partition.event_date, 
  count(*) AS file_count, 
  avg(file_size_in_bytes)/1024/1024 AS avg_mb 
FROM prod.db.events.files 
GROUP BY partition.event_date 
ORDER BY file_count DESC 
LIMIT 20;

If a partition has four files averaging 480 MB and your target file size is 512 MB, rewriting the partition will not materialy affect either planning or scanning time.

However, if another partition has 180 files averaging 25 MB, that partition is a prime candidate for compaction.

Compaction decisions should be based on signals such as this. Not a schedule.

4. Limit Rewrite Scope Per Run

Big compaction jobs appear great on paper. In reality, they are among the easiest ways to create instability in a production lake.

Backfills, partition evolutions, or long stretches of time without maintenance can quickly turn terabytes of data into rewrite candidates. If you don't specify any boundaries, Iceberg will rewrite everything that meets its criteria. The outcome is well understood: long-running jobs, significant shuffles, large amounts of object store I/O, significant increases in snapshot sizes, and sometimes even cluster contention with users' queries.

Compaction operates at its best when it is incremental.

Why "Rewrite Everything" Is A Risk

When you rewrite a large section of a table in a single pass, you are executing a number of costly operations simultaneously:

Reading numerous data files

Shuffle and rewrite them

Create a lot of new files

Create a new large snapshot

Possibly rewrite manifests

Regardless of whether it is successful, you have produced a major maintenance event. Even if it fails in the middle of its execution, you will lose some resources and extend the length of your maintenance period.

Operationally, smaller, and more frequent corrections are safer than infrequent large-scale rewrites.

Restrict rewrite size explicitly

Rewrite_data_files is a method that allows Iceberg to provide parameters that can help limit the amount of effort that is put into a single run of rewriting data.

To illustrate, you may desire to restrict the maximum number of file group rewrites:

CALL prod.system.rewrite_data_files(
  table => 'db.events',
  options => map(
    'max-file-group-rewrites','20'
  )
);

This restricts the number of rewrite groups executed in a single pass of rewrite_data_files. Instead of rewriting hundreds of partitions in one pass, you will execute a controlled sequence of batch passes.

You can also utilize these with eligibility thresholds:

CALL prod.system.rewrite_data_files(
  table => 'db.events',
  options => map(
    'min-input-files','5',
    'max-file-group-rewrites','20',
    'target-file-size-bytes','536870912'
  )
);

Compaction now becomes predictable. Each pass of compaction corrects a finite amount of drift.

Disperse correction over cycles

There is rarely a good reason to try to "Fix everything tonight" if a table contains months of small files due to a large number of small writes.

Instead, a more sustainable and steady state approach would be:

Compact data with limited rewrite scope.

Permit normal operation of user workloads.

Repeat on the subsequent maintenance cycle.

Within a couple of cycles, fragmentation will decrease significantly, without generating a maintenance peak.

This also produces less "snapshot shock." Instead of a large, single-pass rewrite snapshot replacing nearly half of the table, you generate a series of smaller, incremental snapshots.

Prioritize rather than rewriting randomly

Limiting rewrite scope, prioritizing becomes essential.

Practically speaking, you want to rewrite:

Partition groups with the greatest number of files.

Partition groups with the least average file size.

Partition groups with the largest numbers of deletions.

Partition groups with the most user queries.

You can find the worst offending partition groups using the following query:

SELECT
  partition.event_date,
  count(*) AS file_count,
  avg(file_size_in_bytes)/1024/1024 AS avg_mb
FROM prod.db.events.files
GROUP BY partition.event_date
ORDER BY file_count DESC
LIMIT 50;

Then either target the offending partition groups directly using a WHERE statement or permit your orchestration layer to determine the highest impact groups to rewrite first.

For example:

CALL prod.system.rewrite_data_files(
  table => 'db.events',
  WHERE => 'event_date >= DATE ''2026-01-01''',
  options => map(
    'max-file-group-rewrites','10'
  )
);

This limits both the logical scope (only recent partitions) and the physical rewrite volume.

Maintain Maintenance Invisible To Users

The ultimate objective of limiting rewrite scope is not merely cluster stability. It is predictability.

When compaction is done in a manner such that each run is relatively small and bounded:

Maintenance windows are brief.

Resource spikes are under control.

Snapshots grow gradually.

Query performance improves incrementally, rather than suddenly.

In production lakes, stability is usually more important than the rate of correction. Incremental correction is generally preferred to dramatic restructuring.

5. Focus On Hot Partition Groups

In the majority of Iceberg tables, compaction impact is not uniformly distributed.

A small segment of partitions is responsible for most of the pain: They receive the most writes (thus they tend to fragment the quickest) and they receive the most reads (thus every additional file appears as both planning + scan overhead). If you rewrite the partitions that are "hot", you will typically gain 80% of the benefits with only a fraction of the rewrite volume.

The simplest approach to achieve this is to treat compaction as a rolling window problem.

"Hot" Generally Means Two Things

Hot partitions generally represent partitions that are still active:

They are still receiving new files from streaming or micro-batch systems

They are the partitions that your analysts / dashboards / downstream jobs are accessing constantly.

This results in two operational principles:

Compact relatively recent partitions regularly, since they will accumulate the most small files.

Do not compact the actively written partitions unless you know you can tolerate collisions with writers.

AWS describes this for Iceberg compaction: Use a where predicate to exclude actively written partitions, so that you do not encounter data conflicts with writers and leave only metadata conflicts that Iceberg can normally resolve.

Identify Your Rolling Window With Where

Iceberg's Spark procedure provides a where predicate for filtering which files (and hence which partitions) qualify for rewriting.

An extremely common use case is "Compact everything older than the current ingest window":

CALL prod.system.rewrite_data_files(
  table => 'db.events',
  WHERE => 'event_date < DATE ''2026-02-10''',
  options => map(
    'target-file-size-bytes', '536870912',
    'min-input-files', '5'
  )
);

This maintains compaction away from the partition(s) that are currently being mutated, while continually cleaning up yesterday's and previous data.

If your table is partitioned hourly, perform the same concept at the hour granularity:

CALL prod.system.rewrite_data_files(
  table => 'db.events',
  WHERE => 'event_hour < TIMESTAMP ''2026-02-11 12:00:00''',
  options => map(
    'target-file-size-bytes', '536870912',
    'min-input-files', '5'
  )
);

The main principle here is not the specific cut-off. It is maintaining a buffer so that compaction does not conflict with ingestion.

Locate the Worst Partition Groups First

Even within the "hot-ish" window, not all partition groups are equally bad. You can typically find the worst offender simply by examining the file count and average size:

SELECT
  partition.event_date,
  count(*) AS file_count,
  avg(file_size_in_bytes)/1024/1024 AS avg_mb
FROM prod.db.events.files
GROUP BY partition.event_date
ORDER BY file_count DESC
LIMIT 30;

If you plan to compact only a limited portion of the data per pass (as you probably should), this query will indicate where you will obtain the largest initial benefit.

If You Must Compact Partition Groups That Are Still Receiving Late Data

There are certain types of workloads that receive late-arriving events, updates, or merges that keep older partition groups "active." If you compact them regardless, you may periodically collide with writers.

Iceberg includes a partial progress mode that commits compaction in smaller portions, rather than committing a large block, which reduces the collision risk associated with retries when conflicts occur.

CALL prod.system.rewrite_data_files(
  table => 'db.events',
  WHERE => 'event_date >= DATE ''2026-02-01'' AND event_date < DATE ''2026-02-10''',
  options => map(
    'partial-progress.enabled', 'true',
    'partial-progress.max-commits', '10',
    'max-concurrent-file-group-rewrites', '10'
  )
);

You are making a tradeoff of "one clean commit" versus "multiple smaller commits that fail at lower expense." In actual production lakes with continuous write activity, that trade is typically worthwhile.

Where a Control Plane Helps

Once you have numerous tables, "hot partition groups" ceases to be something you determine through experience. You require a loop that continuously determines the "hot partition groups" based on the actual read/write usage of your system and then applies the rolling window concept to those partition groups.

That is the point at which a control plane such as LakeOps becomes useful: It is not adding a new compaction algorithm as much as it is determining where to expend your rewrite budget based on real workload telemetry and applying that determination in a consistent manner across hundreds of tables.

6. Sort or Z-Order When Scan Efficiency Is The Bottleneck

Bin-packing compaction decreases the number of files. However, it does not affect how the data is organized internally within those files.

If partitions are appropriately sized and file counts are reasonable, but selectivity-based queries are scanning a disproportionately larger number of data files than anticipated, the cause is likely clustering. You will commonly observe the following pattern: Planning times are consistent, partition pruning is functioning, but filtered queries are reading a large percent of files within a given partition. This is the point where sorting-based compaction becomes relevant.

Data engines depend on file-based statistical information (such as min and max values) to determine whether a file can be excluded from processing. When data is written in a random fashion, the value ranges between files overlap greatly. Therefore, even selective predicates typically cannot exclude a large number of files. Sorting changes this. When data is ordered by a frequently used filter column, each file will generally contain a narrower value range than previously existed, thus allowing more files to be skipped based upon the predicate and reducing the number of bytes to be scanned.

If there is one column that dominates your predicates (e.g., event_time within a date-partitioned table), a simple sort-based rewrite is typically sufficient:

CALL prod.system.rewrite_data_files(
  table => 'db.events',
  strategy => 'sort',
  sort_order => 'event_time ASC NULLS LAST',
  options => map(
    'target-file-size-bytes','536870912',
    'min-input-files','5'
  )
);

This rearranges the rows of data within files so that time-based predicates can remove more files early in the process. The effect is evident not only in terms of runtime, but also in the reduction in scanned bytes and the number of splits generated.

If your workload filters on multiple columns (e.g., user_id, event_type, and occasionally device_type), Z-order is typically a superior choice:

CALL prod.system.rewrite_data_files(
  table => 'db.events',
  strategy => 'sort',
  sort_order => 'zorder(user_id, event_type)',
  options => map(
    'target-file-size-bytes','536870912',
    'min-input-files','5'
  )
);

Z-ordering enhances locality across multiple dimensions. While Z-ordering will never perfectly optimize any individual column, it typically minimizes overall scan expansion when filter patterns vary.

Important Note: Defining a sort order at the table level does not rewrite historical data. It only affects newly-written data. All existing files will remain unmodified until a rewrite occurs.

It is typical to define a sort order and then expect improvements, only to discover that no changes occurred to the physical layout of the data.

After performing a sort-based compaction, verify it correctly. Examine the number of files that are scanned for common predicates. Compare the total bytes scanned before and after. Runtime can be difficult to quantify in shared environments; however, file count and total bytes scanned are more reliable metrics.

Sorting is more resource-intensive than bin-packing. Sorting involves additional shuffle and CPU overhead during compaction. If you were to blindly apply sorting to all partitions, the costs of maintenance could potentially exceed the query performance improvements. In general, sorting works best when applied selectively: Target high-traffic partitions; Align the sort with real filter patterns; Apply rewrites incrementally.

When scan efficiency is the primary bottleneck rather than file count, sorting or Z-order is one of the few techniques that will reliably enhance pruning. The key is to apply sorting or Z-order in a manner that aligns with the characteristics of your workload.

7. Compact delete-heavy partitions deliberately

You can view a table, note that file sizes are healthy, and believe that compaction is being handled - yet the queries are slowing down.

A common cause is delete files.

Iceberg does not immediately write out data files when rows are modified or deleted. Rather, it will store the row position deletes or equality deletes with the data. At read time, the engine will combine the data files with their associated delete files. Although this provides an efficient mechanism for writing, as delete files become numerous, every query will incur additional cost.

The affect is subtle. File sizes appear fine. Bin-pack has already normalized fragmentation. However, scan CPU increases, and partitions that are update-heavy begin to perform poorer than append-only partitions.

You may usually verify this by examining the delete file distribution:

SELECT
  partition.event_date,
  count(*) AS delete_file_count
FROM prod.db.events.delete_files
GROUP BY partition.event_date
ORDER BY delete_file_count DESC
LIMIT 20;

If certain partitions contain a high concentration of delete files, that is an indication that reads are performing more work than necessary.

Iceberg supports delete aware compaction. Instead of rewriting files solely based upon their size, you may specify a threshold for the ratio of deleted rows and have Iceberg rewrite data files that meet this criteria. For example:

CALL prod.system.rewrite_data_files(
  table => 'db.events',
  options => map(
    'delete-ratio-threshold','0.3',
    'remove-dangling-deletes','true',
    'target-file-size-bytes','536870912'
  )
);

In this case, Iceberg will rewrite data files that are severely impacted by deletes and will remove the physical representation of deleted rows as well as the associated delete files.

The practical result is that queries will no longer require the merging of as many delete files at runtime; CPU decreases; scan cost stabilizes; and planning becomes easier since there are fewer auxiliary files to track.

This has a significant impact primarily on tables that have been subjected to upserts, CDC pipelines, and/or frequent merges. Event tables that are used exclusively for appending data do not typically exhibit this behavior. Similarly, dimension tables and slowly changing datasets do.

Just like with any other compaction strategy, maintain focus. Use the combination of delete thresholds, partition filters, and rewrite limits to optimize the compaction strategy. It is unnecessary to rewrite the entire table simply because a handful of partitions contain a high amount of delete activity.

Healthy file size does not ensure healthy performance. If delete files comprise the majority of a partition, the compaction strategy must specifically target these delete files - otherwise, read cost will continue to increase regardless of whether the layout appears to be "correct".

8. Rewrite position delete files individually when needed

Rewriting data files does not necessarily resolve all delete related issues.

In many update-intensive workloads, position delete files accumulate more quickly than data files are rewritten. Even if you execute a delete aware compaction strategy, you can still find yourself with a large number of position delete files attached to otherwise healthy data files.

Even after executing a compaction strategy that reduces the number of data files, the engine still has to open and apply the delete files during a read operation. Therefore, as delete files accumulate, the scan overhead will remain greater than it should be.

This is particularly prevalent in tables that receive regular upserts or merges. Tables that are subject to append-only inserts do not typically exhibit this type of behavior. However, CDC pipelines and dimension tables do.

Iceberg permits the explicit rewriting of position delete files as follows:

CALL prod.system.rewrite_position_delete_files('db.events');

Rewrite smaller delete files into fewer, larger ones and attempt to eliminate obsolete entries wherever possible. The objective of this approach is not merely to reduce the total number of files, but rather to minimize the number of files that the engine has to open during a read operation and thus improve performance.

You may also limit the scope of this rewrite operation, similar to how you limit the scope of data file rewrites, by specifying the partition(s) to be rewrote:

CALL prod.system.rewrite_position_delete_files(
  table => 'db.events',
  where => 'event_date >= DATE ''2026-02-01'''
);

If you have already rewritten the data files and the performance of your application has not improved, inspect the distribution of delete files. In some cases, rewriting data files will reduce fragmentation, but will leave a heavy delete layer behind. In such cases, the separate rewriting of delete files will be required.

Similar to all of the strategies described throughout this guide, keep the rewriting of delete files focused. There is little to be gained by rewriting delete files across the entire table if only a limited number of partitions are subject to upserts.

In addition, treating the maintenance of delete files as a separate entity maintains the predictability of read cost. Otherwise, even though the data files themselves are sized appropriately, the accumulated overhead that is generated by the engine processing the delete files can slow over time.

9. Lower the commit frequency for streaming workloads

When writing to Iceberg from streaming or micro-batch applications, the commit frequency is one of the largest factors contributing to the overall cost multiplier in the system.

Each commit generates a new snapshot and produces new metadata work. This includes updating the manifest and creating new, small data files. As you commit every few seconds, you don't simply create small files; you create a long chain of snapshots and a continuous flow of metadata churn. While nothing "breaks," the planning is slowed and maintenance must continually struggle to keep pace with the increasing overhead.

The frustrating aspect is that teams typically attempt to resolve this issue by applying more compaction, while the true solution lies upstream: stop committing as often.

The benefits of modifying the commit frequency

When you increase the interval between commits, you generally gain three tangible benefits at once.

First, you generate fewer snapshots, resulting in fewer pieces of metadata that the engine has to evaluate during planning.

Second, you generate fewer manifests / manifest updates overall.

Third, each commit contains more data, resulting in larger files (or fewer small files) and therefore reduced compaction pressure.

You're essentially lowering entropy at the source.

Structured Streaming using Spark: Set a valid trigger interval

A common antipattern is to configure structured streaming to run "as fast as possible" or with a very short trigger. If you are writing to Iceberg tables, avoid this practice unless you have a true requirement for sub-minute freshness.

The following shows the configuration for setting a reasonable commit interval in PySpark:

(
  df.writeStream
  .format("iceberg")
  .outputMode("append")
  .option("checkpointLocation", "s3://prod-checkpoints/events/")
  .trigger(processingTime="1 minute")
  .toTable("prod.db.events")
)

If your service level agreement (SLA) allows it, increase the commit interval to 2–5 minutes. In most analytics lakes, this tradeoff is worthwhile: slightly increased data freshness lag in exchange for significantly decreased metadata churn and less maintenance overhead.

Flink: Commit frequency follows checkpointing

For Flink, Iceberg commits typically follow the checkpoint intervals. If you checkpoint every 30 seconds, you are essentially committing every 30 seconds. That's a lot.

A more reasonable interval would be minutes, not seconds, unless you are operating a low-latency serving pipeline.

Example:

// 5 minutes
env.enableCheckpointing(300_000);

Ultimately, the best value will depend on recovery requirements and end-to-end latency needs. However, the underlying premise is the same: do not checkpoint so frequently that you turn your Iceberg table into a snapshot factory.

A simple method to select the interval

Do not overcomplicate things. Ask yourself: what is the longest delay that your downstream consumers can tolerate for data freshness?

If the response is "near real-time", you may still be fine at 1 minute. If the response is "a few minutes", take advantage of the situation and commit every few minutes.

If the response is "we run dashboards hourly", then committing every 10 seconds is just self-imposed suffering.

Sanity check

If you observe thousands of snapshots being created daily for a single table, this is typically an indication that your commit cadence is too aggressive for an analytics lake. You can certainly use Iceberg as a means of generating data in this manner - it is designed to be correct - but you will pay for it in terms of planning overhead and ongoing maintenance.

Lowering the commit frequency is one of the few optimization techniques that will decrease costs and improve stability, independent of whether you adjust the compaction strategy. Fixing this earlier is beneficial because once you have multiple dozen or hundred of streaming-written tables, this behavior will dictate your operational overhead.

10. Stop the repair loop; fix the write path

Write paths that produce too many small files or heavily skewed partitions will make compaction a never-ending battle. As long as you continue to re-write the same issues, the lake will always drift back into an unhealthy condition.

The majority of "we need more compaction" situations are actually "our write path is poorly configured."

Begin with Distribution Mode

Small file generation is a common result of poor data distribution during the write process. A common scenario is when one writer (task) has the majority of the data for a given partition, it emits a couple of large files, while the remaining writers (tasks) emit a large number of smaller files. Worse, if the data distribution is unstable between batches, you will experience fragmentation regardless of how often you compact.

Iceberg allows you to configure the way data is written across multiple writers. A good baseline configuration for many workloads is hash, as it generally spreads rows out more evenly:

ALTER TABLE prod.db.events SET TBLPROPERTIES (
  'write.distribution-mode' = 'hash'
);

This does not completely remove the necessity for compaction, but it helps slow down how quickly fragmentation occurs again.

Establish a Target File Size for Writers at the Table Level

When writers do not have a target file size, you will see variability in the file sizes produced by writers across the engine and job. Some will produce 16MB files, some will produce 1GB files, etc., and compaction will continually attempt to normalize the mess.

Create a target file size for writers at the table level and maintain it consistent:

ALTER TABLE prod.db.events SET TBLPROPERTIES (
  'write.target-file-size-bytes' = '536870912' -- 512MB
);

Once you have established a target file size for writers at the table level, compaction will transition from "fix everything" to "fix the outliers."

Optimize the Writer and Not Just the Table

Another common reason writers produce small files is due to the number of tasks that are utilized when writing versus the amount of data in each micro-batch or partition. The easiest method to optimize this is to adjust the degree of parallelism at the point of write.

If you are experiencing hundreds of files per partition per batch, consider reducing the number of output partitions prior to writing:

(
  df
  .repartition(200) // pick a number that corresponds to your cluster and batch size
  .writeTo("prod.db.events")
  .append()
)

You don't have to find the optimal number. All you need to do is stop creating 1000 small files because your job happened to run with 1000 tasks.

Do Not Create "Hot Partitions" by Design

Some datasets inherently skew towards certain items, such as one customer producing 70% of the events, or a specific date receiving a massive backfill. When a partitioning scheme directs a large amount of data into a single partition, you will continually be fighting it with compaction.

This is one of the few instances where adjusting the partitioning scheme to provide less skewness can greatly reduce compaction load. A common strategy is to add another dimension to the partitioning scheme (or create a derived shard key) to ensure that a single logical partition does not become a physical hotspot.

You do not need to re-design the entire table. One additional dimension may be sufficient to prevent the worst skewness.

11. Maintain Metadata With Compaction

You can obtain the desired file sizes, reduce the number of files, and yet still end up with a table whose performance and cost characteristics degrade over time. This is typically a metadata issue and not a physical layout issue.

Every time you run compaction, you create a new snapshot. Every snapshot adds to the table's history. Each manifestation accumulates. The old metadata remains until something removes the history. If nothing removes the history, the table becomes deeper and more expensive to reason through, even if the physical data files appear to be clean.

This is the most common trap: Teams focus on rewrite_data_files and neglect what happens to the snapshots and manifestations after the fact.

In general, compaction should be run immediately followed by snapshot expiration. If you keep thousands of historical snapshots around "just in case," the engine still has to traverse the lineage when performing planning. Over time, this shows up as slower metadata reads and longer planning times.

Typically, a snapshot expiration would look something like this:

CALL prod.system.expire_snapshots(
  table => 'db.events',
  retain_last => 10
);

The actual number of retained snapshots will vary based on your rollback and time-travel policies, however, it is critical that you establish a clear retention policy. Retention policies greater than infinite are rarely what you truly need.

After expiring snapshots, it is also beneficial to perform manifest consolidation:

CALL prod.system.rewrite_manifests('db.events');

Even if your file sizes are acceptable, fragmented manifests will require the planner to open and evaluate numerous small metadata files. Manifest consolidation will reduce the fan-out and stabilize the planner costs.

Then there is the removal of orphans. Failed jobs, speculative tasks, and partial re-writes will leave files in object storage that are no longer referenced by the table. Over months, this is a lot of money. Removing these orphans will help to predictably manage the lake.

CALL prod.system.remove_orphan_files(
  table => 'db.events',
  older_than => TIMESTAMP '2026-02-10 00:00:00'
);

The older_than guardrail is essential. You cannot afford to be racing with active writers in a production environment. Safety is more important than aggression.

What complicates this is that the above actions are not independent. How you retain snapshots impacts what you can remove. How frequently the table is updated determines how frequently you need to rewrite manifests. Removing orphans is related to when commits occur and streaming jobs.

Therefore, compaction is not simply a single maintenance action. It is part of a life cycle. Physical data files, snapshots, manifests, and physical storage move in tandem.

At small scales, you can run these actions manually and get away with it. At larger scales, you need to be able to enforce consistency. Tables drift in various ways at varying rates. Without coordinated metadata maintenance, you will continue to repair file layout, while the metadata layer quietly continues to grow.

Your goal is not simply to minimize the number of small files. Your goal is to maintain a table whose performance and cost characteristics remain stable over time. Compaction addresses the data layer. The three above actions address the metadata layer to prevent it from becoming the next bottleneck.

Recap and Conclusion

Compaction in Iceberg is not about scheduling rewrite_data_files. It is about maintaining the alignment between the layout, deletes, and metadata of a table with its actual usage.

Compaction plus table maintenance is now a coordination problem. Therefore, having a control plane to continuously assess the health of tables, prioritize the top partitions requiring compaction, and coordinate compaction with metadata maintenance rather than treat these as separate jobs is a home run on the first step:

Manual work and scripting are typically the alternatives.

We previously reviewed the practical aspects of this:

Use bin-pack to control the file count

Escalate to sorting only when the scan efficiency is the limiting factor

Use gates to restrict the re-writing of healthy data

Limit the scope to make the maintenance predictable

Proactively resolve delete-heavy partitions

Reduce the commit entropy in streaming jobs

Correct the write path to prevent constant repair of the same issues

Connect compaction to snapshot expiration, manifest rewrites, and orphan removal

Most importantly, we connected compaction to snapshot expiration, manifest rewrites, and orphan removal - because the physical data layout and the metadata health are interdependent.

Your goal is not to simply minimize the number of small files. Your goal is to maintain a lake that remains predictable - in terms of performance, cost, and operational overhead - as it grows.

If you are operating Iceberg in production, I would appreciate your feedback regarding what has worked (and failed) for you. Real world patterns are always more interesting than theoretical ones.

Thank you for reading 🍺

Iceberg Rewrite Manifest Files: A Guide

Joni Sar — Sun, 08 Feb 2026 15:31:17 +0000

Iceberg Rewrite Manifest Files: A Guide

A data engineer happily runing into the quicksand

Manifest rewrites are a critical ongoing operation in Iceberg table maintenance.

Data keeps landing, queries stay correct, and nothing looks obviously wrong. But over time, query planning takes longer, metadata reads increase, and latency creeps up even though the amount of data scanned hasn’t really changed. In most production systems, the root cause is not data layout — it’s metadata, and specifically how manifest files accumulate and degrade over time.

Manifest files are central to how Iceberg works. They’re what allow engines to plan efficiently without listing object storage. But with frequent commits, streaming writes, deletes, and long snapshot histories, manifests naturally fragment. Iceberg doesn’t reorganize them automatically, so planning cost quietly grows until it starts to matter.

This guide focuses on rewrite manifests: what they actually do, when they help, and how to run them correctly in production. You’ll learn how to detect when manifest rewrites are needed, how they interact with snapshot expiration and compaction, and why running them in isolation often delivers disappointing results.

We’ll also contrast two operational models: managing all of this manually with scripts and schedules, and handling it continuously through a Control Plane like LakeOps, which optimizes table maintenance based on real workload behavior instead of fixed timers.

The rest of the article guides you through mannual optimization with practical examples. No spec theory, no generic advice — just what actually works when Iceberg tables grow, change, and age in real systems.

Let’s begin then 🙂

Smart Automation vs Manual Scripts

Before getting into mechanics, it’s important to understand the two main ways teams approach manifest management: Automated maintenance optimization with a Control Plane like LakeOps, and doing this operation manually and ongoingly by hand, and using generic scripts.

Continuous Optimization with a Control Plane

A control plane is a layer that sits above your data lake, catalogs, and query engines and takes responsibility for operating and optimizing tables over time. Iceberg defines table structure and guarantees correctness, but it intentionally does not decide when, where, or how aggressively maintenance should run. That operational and optimization gap is exactly what a control plane fills.

Instead of running maintenance because a schedule says it’s time, a control plane continuously optimizes tables based on what is actually happening in the system. Operations run only when and where they are needed, or according to explicit policies you define, rather than blindly across all tables.

LakeOps acts as a control plane for Iceberg by continuously analyzing telemetry from Iceberg catalogs and query engines. Using this data, LakeOps builds a live understanding of how each table behaves in practice.

Cotinious Optimization and Maintenance with an Iceberg Control Plane (source: lakeops.dev)

From that telemetry, LakeOps continuously optimizes table maintenance. Manifest rewrites are triggered only when metadata fragmentation begins to impact planning or cost. Snapshot expiration runs only when retained history no longer provides real value. Compaction is optimized continuously to reduce small files before they create downstream metadata pressure. Orphan cleanup runs when metadata and data files are no longer referenced and can safely be removed.

Coordination is central to optimization. Rewrite manifests, snapshot expiration, compaction, and cleanup are not independent jobs. They are executed as part of a single, continuous optimization loop that ensures only the required operations run, only on the tables that need them, and only at the point where they actually improve performance or cost.

Automating Smart Rewrite manfiest operations (source: lakeops.dev)

Engineers don’t tune per-table schedules or chase drifting thresholds. They decide what should be optimized and within what constraints, and the control plane decides when and where to run each operation. The result is stable metadata, predictable performance, and far less work.

Learn more:

9 Apache Iceberg Table Maintenance Tools You Should Know

Start at the beginning: What Manifest Files Are

Manifest files are the core metadata units Iceberg uses to describe which data files exist and what is inside them. They sit between snapshots and actual data files and are the reason Iceberg can plan queries efficiently without scanning directories or listing objects in storage.

Each manifest file is essentially a list of data file entries. For every data file, the manifest records:

the partition values for that file
record count
per-column statistics such as min and max values
file size and other low-level attributes

When a query runs, the engine does not discover data files by walking object storage. Instead, it reads manifests referenced by the current snapshot and uses the stored statistics to decide which data files can be skipped entirely. This is how Iceberg enables predicate and partition pruning at planning time.

Manifests are immutable. Every commit creates new metadata. When a write happens, Iceberg typically creates one or more new manifest files describing the files added or removed by that commit. Over time, a snapshot references many manifests, some created recently and some carried forward from older snapshots.

This design is powerful, but it has predictable operational consequences.

Frequent small commits, especially from streaming or micro-batch ingestion, tend to produce many small manifest files. For example, a streaming job that commits every minute may generate hundreds or thousands of manifests per day, each describing only a handful of data files. From Iceberg’s point of view this is correct, but for the query engine it means more metadata to read and evaluate during planning.

Another issue is manifest clustering. Manifests are not automatically reorganized around how tables are queried. If files are appended over time with mixed partitions or evolving data distributions, manifests may contain entries that are poorly aligned with common filters. The engine still prunes correctly, but it has to examine more metadata to do so.

Snapshots make this worse if they are not expired. Each snapshot retains references to the manifests that describe its table state. Even if newer snapshots supersede old ones, the metadata remains live as long as those snapshots are kept. This means manifests that are no longer useful for active queries still participate in metadata reads and storage costs.

The net effect is subtle but significant. Query planning time increases even though data size stays flat. Metadata I/O grows quietly. Storage costs creep up due to retained metadata. None of this breaks correctness, which is why it often goes unnoticed until performance degrades.

Manifest rewrites exist specifically to address these issues. They allow Iceberg to reorganize and consolidate manifests so that the metadata layer reflects the current table state and access patterns, rather than the historical accident of how data arrived over time.

What Rewrite Manifests Does

A rewrite manifests operation restructures the metadata layer of an Iceberg table without touching the data itself.

At a high level, Iceberg takes the manifest files referenced by the current snapshot, reads the data-file entries inside them, and writes a new set of manifest files that describe the exact same live data files, just in a layout that’s cheaper for engines to plan against. The commit updates table metadata to point the current snapshot at the new manifests. The old manifests become obsolete once nothing references them anymore (usually after snapshot expiration and cleanup).

This is a metadata rewrite, not a data rewrite. No Parquet/ORC/Avro files are rewritten.

What actually improves

Rewrite manifests helps in three very concrete ways.

It reduces manifest fan-out. When you have many small commits (streaming, micro-batch), you often end up with lots of tiny manifest files. Each query has to open and evaluate those manifests during planning. Rewriting consolidates many small manifests into fewer, larger ones, which reduces metadata I/O and planning latency.

It aligns manifest layout with partitioning. Iceberg sorts data-file entries in manifests by fields in the partition spec. In practice, this tends to make partition pruning cheaper because related entries are adjacent and engines do less work to decide what to skip.

It removes “historical write shape” from the current snapshot. Without rewrites, manifests reflect how data arrived over time, not how it’s queried. Rewriting reorganizes metadata around the current state, which is usually what you actually care about for planning.

What rewrite manifests does not do

It does not compact data files. Tiny data files stay tiny. It does not change partitioning or rewrite records.

It does not delete old manifests by itself. If old snapshots still reference them, they’ll remain. Cleanup is a separate step.

Practical code examples

Below are a few examples that are actually useful in day-to-day operations, not just “hello world”.

1) Measure the problem before you touch anything

Start by inspecting the metadata table that lists manifests. Don’t assume column names — Iceberg versions and engines can differ — so first look at the schema:

\-- Spark: inspect the manifests metadata table schema  
DESCRIBE TABLE EXTENDED prod.db.my\_table.manifests;

Then get a baseline count:

\-- How many manifests does the current snapshot reference?  
SELECT COUNT(\*) AS manifest\_count  
FROM prod.db.my\_table.manifests;

If this number grows steadily week over week while the table isn’t exploding in size, planning overhead is usually creeping up.

2) Rewrite manifests via Spark SQL procedure (the most common operational path)

This runs the rewrite in parallel using Spark:

CALL prod.system.rewrite_manifests('db.my_table');

In Spark, this returns a small result set with counters (how many manifests were rewritten, how many were added). In practice, you run the call, note the counters, and then re-check my_table.manifests to see the manifest count drop.

3) Rewrite manifests for a specific partition spec (when you’ve done partition evolution)

If your table has evolved partition specs over time, you may want to rewrite manifests for a particular spec id:

CALL prod.system.rewrite\_manifests(  
  table   => 'db.my\_table',  
  spec\_id => 1  
);

This is useful when an older spec still contributes a lot of manifest fragmentation and you want to target it instead of doing everything blindly.

4) Disable Spark caching if executors get memory pressure during rewrites

Some environments prefer to avoid caching during maintenance to reduce executor memory footprint:

CALL prod.system.rewrite_manifests('db.my_table', false);

If you’ve ever seen maintenance jobs destabilize executor memory, this is one of the first knobs to reach for.

5) Validate the effect (simple but important)

After the rewrite, validate that you actually improved the metadata shape:

SELECT COUNT(\*) AS manifest\_count\_after  
FROM prod.db.my\_table.manifests;

If the count doesn’t drop (or drops only slightly), the usual causes are that snapshots weren’t expired (old manifests still referenced), or the table’s write pattern keeps producing fragmentation faster than your maintenance cadence.

That’s the point where you either tighten the full maintenance loop (expire snapshots, rewrite manifests, remove orphans, and revisit compaction) or stop doing this manually and let a control plane keep it stable continuously.

When You Should Rewrite Manifests

Manifest rewrites are not something you run on a fixed schedule “just in case”. They are most effective when there is a clear signal that metadata, not data, is becoming the bottleneck.

The most common trigger is planning getting slower while data size stays flat. If query runtimes increase but the amount of data scanned is roughly the same, the extra time is often spent in planning and metadata evaluation. This is especially visible in engines that log planning or analysis time separately.

Another strong signal is manifest growth that outpaces data growth. If storage size grows slowly but the number of manifests keeps climbing, you are accumulating metadata fragmentation. This usually happens in tables with frequent commits, even if each commit is small.

Tables that receive streaming or micro-batch writes are prime candidates. Frequent commits tend to generate many small manifests. Even if data files are reasonably sized, the metadata layer becomes increasingly expensive to process.

A very common real-world pattern is a table that “looks healthy” in storage metrics but becomes steadily slower to query over weeks. Nothing is broken, nothing obvious changed, but planning time creeps up. That is almost always a manifest problem.

Managing Manifest Rewrites Manually

If you don’t use a control plane, the following sequence reflects what works well in production if done right and in context.

Step 1: Inspect Metadata Health

Before you decide to rewrite manifests, you need visibility into the live metadata — not guesswork, not periodic dashboards, but concrete numbers that reflect how fragmented the metadata has become.

Iceberg exposes metadata tables that you can query just like regular tables. These include tables like …$manifests, …$files, …$snapshots, etc. You can use these directly in SQL to inspect current state and spot trouble early.

Iceberg stores metadata in layers:

-   A **manifest list** per snapshot points to all manifests for that snapshot.
-   Each **manifest file** lists a subset of data files, partition values, and column statistics (min/max/null counts).
-   Manifests may be reused across snapshots.

As a result:

Lots of small commits → many small manifests.
Old snapshots hold onto old manifests.
Query engines read manifests during plan time to prune partitions/files.

If manifests are fragmented or numerous, query planning becomes slow because engines read and evaluate many metadata files before they touch actual data.

This is why metadata health matters early, not late.

What to Look At

Here are the core checks you should be doing regularly — ideally automated — to monitor manifest health.

🔍 1) Count the Current Manifests

Run a live count of manifests referenced by the current snapshot:

SELECT COUNT(\*) AS active\_manifest\_count  
FROM prod.db.my\_table$manifests;

A sudden jump in this number relative to data size usually correlates with planning overhead.

A steady climb over time, without data volume growth, is a strong indicator your metadata is fragmenting.

2) Look at Files per Manifest

Iceberg metadata stores statistics such as file counts per manifest. Pull a distribution:

SELECT  
  CASE  
    WHEN record\_count < 10 THEN '<10 rows'  
    WHEN record\_count BETWEEN 10 AND 100 THEN '10–100 rows'  
    ELSE '100+ rows'  
  END AS manifest\_size\_bucket,  
  COUNT(\*) AS manifests  
FROM prod.db.my\_table$manifests  
GROUP BY 1  
ORDER BY 1;

If you see lots of manifests with very few rows/files, that means fragmentation. It means many small manifests (from tiny commits) that blow up planning work.

You can also look at larger manifests: if lots of manifests hold small amounts of data, it’s a sign that maintenance will be valuable.

3) Compare Manifests to Data Growth

If you track how data size and manifest count change together, you can spot divergence:

\-- number of data files  
SELECT COUNT(\*) AS data\_file\_count  
FROM prod.db.my\_table$files;

\-- number of manifests  
SELECT COUNT(\*) AS manifest\_count  
FROM prod.db.my\_table$manifests;

If manifest_count grows faster than data_file_count, that’s another sign of metadata inefficiency.

4) Look at Snapshots (Optional but Useful)

Snapshots tell you how many historical versions you’re retaining, which impacts how many manifests persist:

SELECT  
  committed\_at,  
  snapshot\_id  
FROM prod.db.my\_table$snapshots  
ORDER BY committed\_at DESC  
LIMIT 10;

Long snapshot histories mean old manifests may still be referenced and not cleaned up until expiration happens.

Interpreting the Results

Here are practical heuristics data engineers use:

High manifest count with small average manifest size → metadata fragmentation (good candidate for rewrite).
Stable manifest count but slow query planning → the problem might be clustering, not count; manifest rewrites can help.
Lots of snapshots older than retention needs → metadata is being kept too long; expire them first so rewrites can be effective.
Manifest growth outpacing data file growth → metadata is drifting away from the current shape of data.

Example Scenario

Imagine a streaming table ingesting updates every minute. You might see:

5,000 data files
2,000 manifests
70% of manifests contain <10 files

That’s a classic candidate for consolidated manifests: smaller number of larger manifests will cut planning time dramatically, especially if queries filter on partitions that aren’t well clustered yet.

Step 2: Expire Snapshots First

Always expire snapshots before rewriting manifests. This is not a best-practice nicety — it directly determines whether a manifest rewrite will actually do anything useful.

The easiest way to achieve this is using a Control Plane for automated and optimized maintenance operations that include snapshot expirations in addition to manifest rewrites. Learn more:

Automated and optimized snapshot expiration with a Control Plane (source: lakeops.dev)

Her’es a deep dive into the topic and practical solutions:

11 Expire Snapshots Optimizations for Apache Iceberg

Snapshots are what keep manifests alive. Every snapshot references a specific set of manifest files that describe the table state at that point in time. As long as a snapshot exists, all of its manifests must remain reachable, even if they describe data that is no longer relevant for current queries.

If you run a manifest rewrite while old snapshots are still retained, Iceberg can only optimize the manifests referenced by the current snapshot. Older snapshots will continue to reference older manifests, which means:

old manifests stay in storage,
metadata fan-out remains higher than expected,
storage costs don’t drop,
and in some engines, planning still touches more metadata than necessary.

This is the most common reason teams say “we ran rewrite manifests and it didn’t really help”.

Why snapshot expiration comes first

Think of snapshot expiration as pruning the metadata graph.

Until you expire snapshots, Iceberg is obligated to preserve historical metadata for correctness and time travel. A rewrite cannot remove or consolidate manifests that are still referenced by retained snapshots. Expiring snapshots reduces the metadata surface area first, so the rewrite can actually consolidate what remains.

In practice, snapshot expiration is what turns a rewrite from “cosmetic” into “effective”.

Inspect snapshot history before expiring

Before expiring anything, look at what you’re retaining:

SELECT  
  snapshot\_id,  
  committed\_at,  
  operation  
FROM prod.db.my\_table$snapshots  
ORDER BY committed\_at DESC;

In many production systems, you’ll find snapshots going back weeks or months, even though nobody ever queries historical versions beyond a few days.

That’s usually accidental, not intentional.

Expire snapshots based on real needs

Snapshot retention should reflect actual recovery and audit requirements, not defaults or copy-pasted examples.

If you only need:

a few days of rollback for operational safety, or
short-term auditability,

then retaining dozens or hundreds of snapshots actively hurts metadata efficiency with no upside.

A common pattern is:

retain snapshots newer than a time threshold, and
always keep the last N snapshots as a safety net.

Example: expire old snapshots in Spark

Here’s a practical Spark SQL example:

CALL prod.system.expire\_snapshots(  
  table => 'db.my\_table',  
  older\_than => TIMESTAMP '2024-01-01',  
  retain\_last => 2  
);

This removes snapshots older than the specified timestamp while keeping the most recent snapshots for safety.

After this runs, many old manifests will become unreferenced — which is exactly what you want before rewriting manifests.

Validate the effect

After expiring snapshots, re-check your metadata:

SELECT COUNT(\*) AS remaining\_snapshots  
FROM prod.db.my\_table$snapshots;

You should see a much smaller snapshot set. At this point:

old manifests are no longer protected,
rewrite manifests can actually consolidate metadata,
and orphan cleanup will be able to reclaim storage.

Step 3: Run Rewrite Manifests

At this point, the current snapshot references only the metadata that still matters. That gives Iceberg room to consolidate and reorganize manifests instead of carrying forward historical baggage.

What this step actually does now

After snapshot expiration, rewrite manifests can:

merge many small manifests into fewer, larger ones,
reorganize data-file entries so they’re better clustered by partition and statistics,
reduce the amount of metadata the engine has to read during planning.

If you skip snapshot expiration, most of these benefits are muted. After expiration, they show up immediately in planning time and metadata size.

Running rewrite manifests (Spark example)

In Spark-based environments, this is usually done via a system procedure:

CALL prod.system.rewrite\_manifests('db.my\_table');

This executes the rewrite in parallel across the cluster. Spark will read the existing manifests, generate a new optimized set, and commit a new snapshot that references them.

The command itself is simple. The impact depends entirely on whether you prepared the table correctly in the earlier steps.

Validate that it actually worked

After the rewrite, always check the result:

SELECT COUNT(\*) AS manifest\_count\_after  
FROM prod.db.my\_table$manifests;

You should see a noticeable drop in manifest count or, at the very least, fewer very small manifests. If nothing changes, the usual reasons are:

snapshots were not expired, so old manifests are still referenced,
the table’s write pattern is fragmenting metadata faster than maintenance runs,
or the table is already in a reasonably healthy state.

Why this step is cheap — but not free

Rewrite manifests does not rewrite data files, so it’s much cheaper than compaction. However, it still:

reads all manifests referenced by the current snapshot,
writes new manifest files,
and commits new metadata.

On large tables with many manifests, this can still consume noticeable CPU, memory, and I/O. That’s why you should not run it blindly across hundreds of tables at once.

Practical scheduling guidance

If you’re doing this manually:

avoid peak query hours,
stagger rewrites across tables,
and gate execution on actual metadata health signals rather than time alone.

Running rewrite manifests selectively, when metadata drift is real, is what keeps it a high-ROI operation instead of background noise.

Step 4: Remove Orphan Files

Once snapshots are expired and manifests are rewritten, you need to clean up what is no longer referenced.

Orphan files are data or metadata files that exist in storage but are no longer referenced by any snapshot. They typically appear after snapshot expiration, manifest rewrites, failed jobs, or aborted commits. Iceberg does not delete these files automatically, because doing so without coordination would risk correctness.

If you stop after rewriting manifests, those unreferenced files will remain in object storage indefinitely.

From Iceberg’s point of view, everything is correct after a rewrite. From your cloud bill’s point of view, nothing changed.

Skipping orphan cleanup is one of the most common reasons teams see storage costs grow even though they “ran all the maintenance jobs.” The metadata graph is clean, but the physical files are still sitting in S3, GCS, or ADLS.

This step is what turns logical cleanup into actual cost reduction.

What orphan cleanup actually removes

Orphan cleanup removes:

old manifest files no longer referenced by any snapshot,
metadata files left behind by rewrites and expired snapshots,
data files created by failed or rolled-back writes.

It does not remove any file that is reachable from a live snapshot. If a file is still referenced, it stays.

Running orphan cleanup (Spark example)

In Spark environments, orphan cleanup is typically done with a system procedure:

CALL prod.system.remove\_orphan\_files(  
  table \=\> 'db.my\_table',  
  older\_than \=\> TIMESTAMP '2024-01-01'  
);

The older_than guard is critical. It ensures Iceberg only deletes files older than a safe cutoff, protecting against races with in-flight or recently committed jobs.

Never run orphan cleanup without a time threshold.

Validate the effect

After cleanup, you should see:

a reduction in storage usage over time,
fewer unreferenced metadata files,
and no change in query results or correctness.

Storage metrics won’t always drop instantly due to object store reporting delays, but the trend should flatten instead of creeping upward.

The key takeaway

Snapshot expiration and manifest rewrites clean up logical metadata. Orphan cleanup is what turns that into physical cleanup.

If you skip this step, maintenance looks successful on paper but storage costs keep rising. If you include it consistently, metadata maintenance finally translates into real, measurable savings.

Step 5: Coordinate with Compaction

Manifest rewrites optimize how files are described. Compaction optimizes how many files exist. If you ignore compaction, manifest rewrites will help briefly — then fragmentation will return.

Small data files are the main upstream cause of manifest churn. Every time a write job produces many small files, Iceberg must record them in metadata. Even if you rewrite manifests perfectly, frequent small-file writes will recreate fragmentation within days.

The optimal solution is to use an Iceberg Control Plane. LakeOps for example, compacts data 95% faster and cheaper than alternatives thanks to a rust based engine and analyzed cross-system data. Compaction is also smarter, so query times go up ans costs go down dramatically.

![](https://cdn-images-1.medium.com/max/1600/1*7w1IT-CzuDQRQsVuCionDA.png)

Compaction optimizagtion with a dcontrol plane (source:lakeops/dev)

In LakeOps, compaction processes are also synchronized with maintenance processes like manifest rewrites, so everything runs smoothly and you don’t have to connect and coordinate it yourself. Results are optimized and are usually far better than a home-made solution.

Example: why rewrites alone don’t hold

Let’s go back to the core problem for a second, and then see how to manually address it if you don’t use a control plane.

Consider a table with streaming ingestion committing every few minutes:

Each commit writes 20–50 small Parquet files.
Each commit creates one or more new manifests.
After a week, the table has thousands of data files and hundreds of manifests.

You run snapshot expiration and rewrite manifests. Planning time improves.

Two days later, the table is slow again.

Nothing is broken. The write pattern simply recreated the same metadata pressure. This is what happens when compaction is missing or misaligned.

Use metadata to confirm compaction pressure

Before scheduling more manifest rewrites, check whether small files are the real problem:

SELECT  
  COUNT(\*) AS file\_count,  
  AVG(file\_size\_in\_bytes) / 1024 / 1024 AS avg\_file\_mb  
FROM prod.db.my\_table$files;

If the average file size is far below your engine’s sweet spot, manifest rewrites are treating symptoms, not the cause.

Another useful signal:

SELECT  
  COUNT(\*) AS manifests,  
  SUM(added\_data\_files\_count) AS total\_files\_tracked  
FROM prod.db.my\_table$manifests;

If file counts are high and keep growing quickly, metadata pressure will return unless compaction slows it down.

How compaction stabilizes manifest rewrites

When compaction is running correctly:

fewer data files are created per write cycle,
manifests grow more slowly and stay denser,
rewrite manifests becomes an occasional cleanup, not a recurring firefight.

In stable tables, teams often find that:

compaction runs frequently (or continuously),
manifest rewrites run infrequently,
snapshot expiration runs regularly.

That balance is what keeps planning predictable.

Practical coordination rule

In production, a simple rule holds up well: If manifest rewrites are needed often, compaction is not doing enough.

If you find yourself rewriting manifests weekly or daily on the same tables, it’s usually a sign that upstream file layout is unstable.

Should you add compaction code here?

At this point in the guide, full compaction code examples are usually not helpful. Compaction is engine-specific, workload-specific, and already well-covered elsewhere. What matters here is understanding the dependency:

compaction reduces metadata churn,
reduced churn makes manifest rewrites effective,
without compaction, rewrites are temporary relief.

That mental model is more valuable than a generic compaction snippet.

Step 6: Add policies and guardrails

If you manage maintenance with scripts or schedulers, automation needs guardrails or it will drift out of alignment with reality.

The simpest way is to add a Control Plane.

Then you can define policies per table or for your entire lake.

Define maintenance policies with a control plane (source:lakeops.dev/)

Start by skipping inactive tables. Tables that are rarely queried or written to don’t need aggressive maintenance. Running rewrites on them just burns cluster resources without benefit.

Avoid peak query hours. Even though manifest rewrites are cheaper than data compaction, they still consume CPU, memory, and I/O. Running them during high query load increases contention and hurts user-facing performance.

Trigger maintenance based on observed metadata health, not time alone. Manifest count, average files per manifest, snapshot growth, and planning time trends are far better signals than “once a day” or “once a week”.

Finally, expect thresholds to change. Write patterns evolve, query behavior shifts, and what worked six months ago may be wrong today. Scripts that never get revisited slowly turn into background noise or, worse, a source of instability.

This is the point where many teams decide that maintaining guardrails manually is more work than it’s worth and move metadata maintenance into a control plane.

Practical steps to take

Rewrite manifests is one of those Iceberg operations that looks optional until it isn’t. When metadata is healthy, planning is fast and predictable. When it drifts, everything still works — just slower and more expensively. That’s why manifest issues often go unnoticed for a long time.

In this guide, we walked through what manifest files actually are, why they fragment in real systems, and what rewrite manifests really does under the hood. We covered when rewrites are worth running, why snapshot expiration has to come first, how orphan cleanup turns logical cleanup into real cost savings, and why compaction is the long-term stabilizer that keeps metadata from degrading again.

If you’re managing this manually, the step-by-step approach will get you there. If you want this handled continuously and optimzied, a Control Plane exists to do exactly that — operating and optimizing Iceberg tables based on real workload behavior instead of fixed schedules.

The big takeaway is that rewrite manifests only works well as part of a coordinated maintenance loop. Run it in isolation and the benefits are usually temporary. Pair it with snapshot expiration, compaction, and cleanup, and it becomes one of the highest-ROI metadata optimizations Iceberg offers.

If you’ve run into edge cases, different patterns, or lessons learned the hard way, feel free to share them in the comments.

Thanks for reading, and hope this helps keep your Iceberg tables healthy, fast, predictable, and boring in all the right ways.

Cheers 🍺

Learn more

11 Iceberg Performance Optimizations You Should Know.

13 Apache Iceberg Optimizations You Should Know.

11 Apache Iceberg Cost Reduction Strategies You Should Know.

9 Data Lake Cost Optimization Tools You Should Know.

11 Must-Know FrontEnd Trends for 2020

Joni Sar — Sun, 29 Dec 2019 12:30:59 +0000

Or- how to sound smart in frontEnd lunch conversations!

Sounding smart at your team's lunch talks is obviously a great reason to stay updated with the latest frontend trends. It might even help you become a better developer, build better technology and better products. Maybe.

So, please allow me to make this honorable quest easier by pointing you in a few interesting directions. I will not explain every concept A-Z, but will introduce the concept, how it’s useful and direct to further resources.

For example, we’ll shortly cover an introduction to Micro Fontends, Atmoic Design, Web components TS take-over, ESM CDN and even Design tokens. Feel free to scroll through and mark the topics you’d like to learn more about. For any questions or more suggestions, just drop a comment below.

Short disclaimer: I'm on the team building Bit. This doesn't make any of the following less true though. Enjoy!

1. Micro frontends

Micro Frontends are the buzziest frontend topic for lunch conversations.
Ironically, while frontend development enjoys the modular advantages of components, it is still largely more monolithic than backend microservices.

Micro frontends bring the promise of splitting your frontend architecture into different frontends for different teams working on different parts of your app. Each team can gain autonomy over the end-to-end lifecycle of their micro frontend, which can be developed, versioned, tested, built, rendered, updated and deployed independently (using tools like Bit for example).
Instead of explaining the whole concept here, **read this great post** by @thecamjackson published at the @martinfowler blog. It’s really good and should cover everything you need to start digging into this concept.

However, there are still certain shortages in today’s ecosystem. Mostly, people are worried by issues like the deployments of separate frontends, bundling, environment differences etc. Bit already lets you isolate, version, build, test and update individual frontends/components. For now, this is mainly useful when working with multiple applications (though It’s already commonly used for gradually refactoring parts of existing apps via components).
When Bit will introduce deployments in 2020, independent teams will get the power to develop, compose, version, deploy and update standalone frontends. It will let you compose UI apps together and let teams create simple decoupled codebases with independent continuous deployments and incremental upgrades. The composition of these frontends will end up creating your application. Here's how a UI app composed with Bit looks like:

Learn more:
Micro Frontends - Martin Fowler

2. Atomic Design

Atomic Design is yet another super interesting topic for lunch talks, which I like to think about more of as a philosophy than a pure methodology.
Simply put, the theory introduced by Brad Frost compares the composition of web applications to the natural composition of Atoms, Molecules, Organisms and so on- ending with concrete web pages. Atoms compose molecules (e.g. text-input + button + label atoms = search molecule). Molecules compose an organism. Organisms live in a layout template, which can be concretized into a page delivered to your users.
Here’s a *detailed 30-seconds explanation with visual examples. *It includes very impressive drawings I made with great artistic talent, which you can copy-paste to your office board 😆
The advantages of Atomic components go beyond building modular UI applications through modular and reusable components. This paradigm forces you to think in composition so you better understand the role and API of every component, their hierarchy, and how to abstract the building process of your application in an effective and efficient way. Take a look.

3. Encapsulated Styling and Shadow Dom

Source: developer.mozzila.org
An important aspect of components is encapsulation — being able to keep the markup structure, style, and behavior hidden and separate from other code on the page so that different parts do not clash, and the code can be kept nice and clean. The Shadow DOM API is a key part of this, providing a way to attach a hidden separated DOM to an element.
Shadow DOM is actually used by browsers for a long time now. You can think of the shadow DOM as a “DOM within a DOM”. It is its own isolated DOM tree with its own elements and styles, completely isolated from the original DOM.
It allows hidden DOM trees to be attached to elements in the regular DOM tree — this shadow DOM tree starts with a shadow root, underneath which can be attached to any elements you want, in the same way as the normal DOM. The main implication of this is that we have no need for a namespace for our classes, as there’s no risk of name clashing or style spilling. There also additional advantages. It is often referred to as the long-promised solution to a true encapsulation of styles for web components. Learn more:
Using shadow DOM

4. The TypeScript take over

So lately every conversation makes it sound like TS is taking over frontend development. It is reported that **80% of developers admit they would like to use or learn TypeScript in their next project**.
Although it has it’s shortcomings, TS code is easier to understand, faster to implement, it produces less bugs and requires less boilerplate. Want to refactor your React app to work with TS? Go for it. Want to start gradually? Use tools like Bit to gradually refactor components in your app to TS and use the React-Typescript compiler to build them independently from your app. This way to can gradually upgrade your code one component at a time.
learn more:
Why TypeScript is the best way to write Front-end in 2019And why you should convince everybody to use it.
TypeScript at Lyft
TypeScript at Slack

5. Web components- Stencil, Svelte, Lit & friends!

So basically, this is the future. Why? because these pure web components are framework agnostic and can work without a framework or with any framework- spelling standardization. Because they are free from JS fatigue and are supported by modern browsers. Because their bundle size and consumption will be optimal, and VDOM rendering is mind-blowing.
These components provide Custom Element, a Javascript API that allows you to define a new kind of html tag, HTML templates to specify layouts, and of course the Shadow DOM which is component-specific by nature.
Prominent tools to know in this space are **Lit-html (and Lit-element), **StencilJS, **SvelteJS and of course Bit**, for reusable modular components which can be directly shared, consumed and developed anywhere.
When thinking of the future of our UI development, and of how principles of modularity, reusability, encapsulation, and standardization should look like in the era of components, web components are the answer. Learn more:

6. From component libraries to dynamic collections

Organize components in dynamic collections; reuse, compose, stay independent
The emergence of component-driven development gave birth to a verity of tools. One prominent tool is Bit, alongside it’s hosting platform Bit.dev.
Instead of working hard to build a cumbersome and highly-coupled component-library, use Bit to continuously isolate and export existing components into a dynamically reusable shared-collection.
Using Bit (GitHub) you can independently isolate, version, build, test and update UI components. It streamlines the process of isolating a component in an existing app, harvesting it to a remote collection, and using it anywhere. Every component can build, test, and render outside of any project. You can update a single component (and it’s dependants) and not the whole app.

In the bit.dev platform (or on your own server) your components can be remotely hosted and organized for different teams, so that every team can control the development of their own components. Every team can share and reuse components but keep their independence and control.
The platform also provides the all-in-one ecosystem for a shared components out-of-the-box: It auto-documents UI components, renders components in an interactive playground, and even provides a built-in registry to install components using npm/yarn. In addition, you can bit import components for modifications in any repository.

In the short run, this revolutionizes the process of sharing and composing components in a similar way to how Spotify/iTunes changed the process of previously sharing Music through static CD Music Albums. It’s a dynamic and modular solution that lets everyone share and use components together.
In the long run, Bit helps pave the way to micro-frontends. Why? Because it already lets you independently version, test, build and update parts of your UI application. In 2020 it will introduce independent deployments, which will finally allow different teams to own parts of your apps end-to-end: keep decoupled and simple codebases, let teams cautiously and continuously build and deploy incremental UI upgrades, and compose frontends together.

7. State management: Bye Bye Redux? (Not….)

Redux is a hard beast to kill. While the pains of globally managing states in your app are becoming more clear as frontend becomes more modular, the sheer usefulness of Redux makes it a go-to solution for many teams.
So will we say bye-bye to Redux in 2020? Probably not entirely 😄
However, the uprising of new features within frameworks that handle states (React hooks, Context-API etc) are painting the way to a future without a global store. Tools like Mobx, which only a year ago were rather scarcely adopted, are becoming more popular every day thanks to their component-oriented and scalable nature. You can explore more alternatives here.
Read: *Making Sense of React Hooks* — by Dan Abramov

8. ESM CDN

ES Modules is the standard for working with modules in the browser, standardized by ECMAScript. Using ES modules you can easily encapsulate functionalities into modules which can be consumed via CDN etc. With the release of Firefox 60, all major browsers will support ES modules, and the Node mteam is working on adding ES module support to Node.js. Also, ES module integration for WebAssembly is coming in the next few years. Just imagine modular Bit UI components composed in your app via CDN…

9. Progressive web apps. Still growing.

Progressive web applications take advantage of the latest technologies to combine the best of web and mobile apps. Think of it as a website built using web technologies but that acts and feels like an app. Recent advancements in the browser and in the availability of service workers and in the Cache and Push APIs have enabled web developers to allow users to install web apps to their home screen, receive push notifications and even work offline.
Since PWAs provide an intimate user experience and because all network requests can be intercepted through service workers, it is imperative that the app be hosted over HTTPS to prevent man-in-the-middle attacks, which also spells better security. Here’s a great talk by Facebook developer Omer Goldberg outlining best practices for PWAs.

10. Designer-developer integrations

With the uprise of component-driven design systems to enable a consistent UI across products and teams, new tools have emerged to bridge the gap between designers and developers. This is no simple task however; While code itself is really the only source of truth (this is what your user really gets), most tools try to bridge the gap from the designer’s end. In this category you can find Framer, Figma, Invision DSM and more.
From the developer’s end you can see how platforms like Bit.dev, which host your next-gen component library and helps create adoption for shared components. The platform provides rendered visualization for your actual source-code so that designers can collaborate wit developers and create discussions over the source-code itself, in a visual way.
Another promising idea to take note of is design-tokens. Placing tokens in your code through which designers can really control simple styling aspects (e.g. colors) directly through external collaboration tools. Integrated with platforms like Bit.dev, this can create a tighter workflow than ever before.

UI Component Design System: A Developer’s Guide

JoniSar ・ Oct 23 '19 ・ 10 min read

#design #ui #javascript #frontend

UI Design System and Component Library: Where Things Break | by Jonathan Saring | codeburst

Jonathan Saring ・ Aug 22, 2019 ・ 8 min read
Medium

7 Tools for Building Your Design System in 2020 | by Jonathan Saring | Bits and Pieces

Jonathan Saring ・ Dec 4, 2019 ・ 11 min read
Medium

11. Web assembly — into the future?

Web assembly brings language diversity into web development to cover gaps created by JavaScript. It is defined as a “a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust, enabling deployment on the web for client and server applications”.
In his post, Eric Elliott elegantly outlines the concept’s benefits:

An improvement to JavaScript: Implement your performance critical stuff in wasm and import it like a standard JavaScript module.
A new language: WebAssembly code defines an AST (Abstract Syntax Tree) represented in a binary format. You can author and debug in a text format so it’s readable.
A browser improvement: Browsers will understand the binary format, which means we’ll be able to compile binary bundles that compress smaller than the text JavaScript we use today. Smaller payloads mean faster delivery. Depending on compile-time optimization opportunities, WebAssembly bundles may run faster than JavaScript, too!
A Compile Target: A way for other languages to get first-class binary support across the entire web platform stack To learn more about this concept, why it’s useful, where it will be used and why it’s not here yet, I suggest this great post and this great video. Why We Need WebAssembly: An Interview with Brendan Eich *Brendan Eich & Eric Elliott Discuss WebAssembly Details*medium.com

Learn more

13 Top React Component Libraries for 2020 | by Fernando Doglio | Bits and Pieces

Fernando Doglio ・ Jun 8, 2020 ・ 13 min read
Medium

11 Top Angular Developer Tools for 2020 | by Giancarlo Buomprisco | Bits and Pieces

Giancarlo Buomprisco ・ Dec 24, 2019 ・ 8 min read
Medium

11 Top VueJS Developer Tools for 2020 | by Shanika Wickramasinghe | Bits and Pieces

Shanika Wickramasinghe ・ Dec 24, 2019 ・ 8 min read
blog.bitsrc.io

Reuse React Components Between Apps Like a Pro

Joni Sar — Wed, 20 Nov 2019 13:58:52 +0000

One of the reasons we love React is the truly reusable nature of its components, even compared to other frameworks. Reusing components means you can save time writing the same code, prevent bugs and mistakes, and keep your UI consistent for users across your different applications.

But, reusing React between apps components can be harder than it sounds. In the past, this process involved splitting repositories, boiler-plating packages, configuring builds, refactoring our apps and more.

In this post, I'll show how to use Bit (GitHub) in order make this process much easier, saving around 90% of the work. Also, it will allow you to gradually collect existing components from your apps into a reusable collection for your team to share - like these ones.

In this short tutorial, we'll learn how to:

Quickly setup a Bit workspace
Track and isolate components in your app
Define a zero-config React compiler
Version and export components from your app
Use the components in a new app

Bonus: Leveraging Bit to modify the component from the consuming app (yes), and syncing the changes between the two apps.

Quick Setup

So for this tutorial, we've prepared an example React App on GitHub you can clone.

$ git clone https://github.com/teambit/bit-react-tutorial
$ cd bit-react-tutorial
$ yarn

Now, go ahead and install Bit.

$ npm install bit-bin -g

Next, we'll need a remote collection to host the shared components. You can set up on your own server, but let's use Bit's free component hub instead. This way our collection can be visualized and shared with our team, which is very useful.

quickly head over to bit.dev and create a free collection. It should take less than a minute.

Now return to your terminal and run bit login to connect your local workspace with the remote collection, where we'll export our components.

$ bit login

Cool. Now return to the project you've cloned and init a Bit workspace:

$ bit init --package-manager yarn
successfully initialized a bit workspace.

That's it. Next, let's track and isolate a reusable component from the app.

Track and isolate reusable components

Bit lets you track components in your app, and isolates them for reuse, including automatically defining all dependencies. You can track multiple components using a glob pattern (src/components/*) or specify a path for a specific component. In this example, we'll use the later.

Let's use the bit add command to track the "product list" component in the app. We'll track it with the ID 'product-list'. Here's an example of how it will look like as a shared component in bit.dev.

$ bit add src/components/product-list
tracking component product-list:
added src/components/product-list/index.js
added src/components/product-list/product-list.css
added src/components/product-list/products.js

Let's run a quick bit status to learn that Bit successfully tracked all the files of the component. You can use this command at any stage to learn more, it's quite useful!

$ bit status
new components
(use "bit tag --all [version]" to lock a version with all your changes)

     > product-list ... ok

Define a zero-config reusable React compiler

To make sure the component can run outside of the project, we'll tell Bit to define a reusable React compiler for it. This is part of how Bit isolates components for reuse, while saving you the work of having to define a build step for every component.

Let's import the React compiler into your project's workspace. You can find more compiler here in this collection, including react-typescript.

$ bit import bit.envs/compilers/react --compiler
the following component environments were installed
- bit.envs/react@0.1.3

Right now the component might consume dependencies from your project. Bit's build is taking place in an isolated environment to make sure the process will also succeed on the cloud or in any other project. To build your component, run this command inside your react project:

$ bit build

Version and export reusable components

Now let's export the component to your collection. As you see, you don't need to split your repos or refactor your app.

First, let's tag a version for the component. Bit lets you version and export individual components, and as it nows about each component's dependants, you can later bump versions for single component and all its dependants at once.

$ bit tag --all 0.0.1
1 component(s) tagged
(use "bit export [collection]" to push these components to a remote")
(use "bit untag" to unstage versions)

new components
(first version for components)
     > product-list@0.0.1

You can run a quick 'bit status' to verify if you like, and then export it to your collection:

$ bit export <username>.<collection-name>
exported 1 components to <username>.<collection-name>

Now head over to your bit.dev collection and see how it looks!
You can save a visual example for your component, so you and your team can easily discover, try and use this component later on.

Install components in a new app

Create a new React app using create-create-app (or your own).

$ npx create-react-app my-new-app

Move over to the new app you created.
Install the component from bit.dev:

$ yarn add @bit/<username>.<collection-name>.product-list --save

That's it! you can now use the component in your new app!

If you want to use npm, run npm install once after the project is created so a package-lock.json will be created and npm will organize dependencies correctly.

Modify components from the consuming app

Now let's use Bit to import the component's source-code from bit.dev and make some changes, right from the new app.

First, init a Bit workspace for the new project:

$ bit init

And import the component

$ bit import <username>.<collection-name>/product-list
successfully imported one component

Here is what happened:

A new top-level components folder is created that includes the code of the component, with its compiled code and node_modules (in this case the node_modules are empty, as all of your node_modules are peer dependencies and are taken from the root project.

The .bitmap file was modified to include the reference to the component
The package.json file is modified to point to the files rather than the remote package. Your package.json now displays:

"@bit/<username>.<collection-name>.product-list": "file:./components/product-list"

Start your application to make sure it still works. As you'll see, no changes are required: Bit takes care of everything.

Then, just go ahead and make changes to the code anyway you like!
Here's an example.

Now run a quick bit status to see that the code is changed. Since Bit tracks the source-code itself (via a Git extension), it "knows" that the component is modified.

$ bit status
modified components
(use "bit tag --all [version]" to lock a version with all your changes)
(use "bit diff" to compare changes)

     > product-list ... ok

Now tag a version and export the component back to bit.dev:

$ bit tag product-list
1 component(s) tagged
(use "bit export [collection]" to push these components to a remote")
(use "bit untag" to unstage versions)

changed components
(components that got a version bump)
     > <username>.<collection-name>/product-list@0.0.2

and...

$ bit export <username>.<collection-name>
exported 1 components to <username>.<collection-name>

You can now see the updated version with the changes in bit.dev!

Update changes in the first app (checkout)

Switch back to the react-tutorial app you cloned and exported the component from, and check for updates:

$ bit import
successfully imported one component
- updated <username>.<collection-name>/product-list new versions: 0.0.2

Run bit status to see that an update is availabe for product-list:

$ bit status
pending updates
(use "bit checkout [version] [component_id]" to merge changes)
(use "bit diff [component_id] [new_version]" to compare changes)
(use "bit log [component_id]" to list all available versions)

    > <username>.react-tutorial/product-list current: 0.0.1 latest: 0.0.2

Merge the changes done to the component to your project. The structure of the command is bit checkout <version> <component>. So you run:

$ bit checkout 0.0.2 product-list
successfully switched <username>.react-tutorial/product-list to version 0.0.2
updated src/app/product-list/product-list.component.css
updated src/app/product-list/product-list.component.html
updated src/app/product-list/product-list.component.ts
updated src/app/product-list/product-list.module.ts
updated src/app/product-list/products.ts

Bit performs a git merge. The code from the updated component is now merged into your code.

Run the application again to see it is working properly with the updated component:

$ yarn start

That's it. A change was moved between the two projects. Your application is running with an updated component.

Happy coding!

Conclusion

By being able to more easily reuse React components between applications you can speed your development velocity with React, keep a consistent UI, prevent bugs and mistakes and better collaborate as a team over a collection of shared components. It's also a useful way to create a reusable UI component library for your team in a gradual way without having to stop everything or lose focus.

Feel free to try it out yourself, explore the project in GitHub. Happy coding!

UI Component Design System: A Developer’s Guide

Joni Sar — Wed, 23 Oct 2019 12:08:08 +0000

Component design systems let teams collaborate to introduce a consistent user visual and functional experience across different products and applications.

On the designer's side, a predefined style guide and set of reusable master components enable consistent design and brand presented to users across all different instances (products etc) built by the organization. This is why great teams like Uber, Airbnb, Shopify and many others work so hard to build it.

On the developer's side, a reusable set of components helps to standardize front-end development across different projects, save time building new apps, reduce maintenance overhead and provide easier onboarding for new team members.

Most importantly, on the user's side, a successful component design system means less confusion, better navigation of your products, warm and fuzzy brand-familiarity feeling and better overall satisfaction and happiness. For your business, this means better results.

But, building a successful design system can be trickier than you might think. Bridging the gap between designers and developers is no simple task, both in the process of building your system as well as over time. In this post, we’ll walk-through the fundamentals of successfully building a component design system, using it across projects and products, and growing a thriving and collaborative component ecosystem within the organization, that brings everyone together. We’ll also introduce some shiny modern tools that can help you build it. Please feel free to comment below, ask anything, or share from your own experience!

Bridging the gap between design and development through components

When building your system you will face several challenges. The first, is achieving true collaboration between designers, developers and everyone else (product, marketing etc). This is hard. Designers use tools like Photoshop, Sketch etc which are built for generating “flat” visual assets that don’t translate into real code developers will use. Tools like Framer aim to bridge this gap on the designer’s side.

Developers work with Git (and GitHub) and use different languages and technologies (such as component-based frameworks: React, Vue etc) and have to translate the design into code as the source of truth of the design’s implementation. Tools like Bit turn real components written in your codebase into a visual and collaborative design system (examples), making it easy to reuse and update components across apps, and visualizing them for designers.

Modern components are the key to bridging this gap. They function as both visual UI design elements as well as encapsulated and reusable functional units that implement UX functionality that can be used and standardized across different projects in your organization’s codebase.

To bridge the gap, you’d have to let designers and other non-coding stakeholders collaborate over the source of truth, which is code. You can use Bit or similar tools to bridge this gap and build a collaborative component economy where developers can easily build, distribute and adopt components while designers and everyone else can collaborate to build and align the design implementation of components across applications.

Choosing your stack and tools

The choice of technologies and tools is a major key in the success of your design system. We’ll try to narrow it down to a few key choices you’d have to make along the way:

Framework or no framework?

Modern frameworks like React, Vue and Angular provide an environment where you can build components and build applications with components. Whether you choose a view library or a full-blown MVC, you can start building your components with a mature and extensive toolchain and community behind you. However, such frameworks might not be future proof, and can limit the reuse and standardization of components on different platforms, stacks and use-cases.

Another way to go is framework-agnostic web components. Custom components and widgets that build on the Web Component standards, will work across modern browsers, and can be used with any JavaScript library or framework that works with HTML.

This means more reuse, better stability, abstraction and standardization, less work and pretty much everything else that comes with better modularity. While many people are sitting around waiting on projects like web-assembly, in the past year we see new tools and techs rise to bring the future today.

The core concept of a standardized component system that work everywhere goes naturally well with the core concept of web components, so don’t be quick to overlook it despite the less mature ecosystem existing around it today.

Component library or no library?

Building a component library is basically a way to reduce the overhead that comes with maintaining multiple repositories for multiple components. Instead, you group multiple components into one repository and distribute it like a multi-song CD music album.

The tradeoff? App developers (component consumers) can’t use, update or modify individual components they need. They are struggling with the idea of coupling the development of their products to that of the library. Component collaboration platforms like Bit can greatly mitigate this pain, by sharing your library as a “playlist” like system of components that people can easily discover, use, update and collaborate-over across projects and teams. Every developer can share, find, use and update components right from their projects.

Most larger organization implement a library (examples) to consolidate the development of their components, consolidate all development workflows around the project and control changes. In today's ecosystem, it’s hard to scale component-based design systems without libraries mostly due to development workflows (PRs, issues, deployment etc). In the future, we might see more democratized component economies where everyone can freely share and collaborate.

When building your library you effectively build a multi-component monorepo. Open-source tools like bit-cli can help you isolate each component, automatically define all its dependencies and environments, test and build it in isolation, and share it as a standalone reusable unit. It also lets app-developers import and suggest updates to components right from their own projects, to increase the adoption of shared components.

Component discoverability and visualization

When building and distributing components you must create a way for other developers, and for non-developers collaborating with you, to discover and learn exactly which components you have, what they look like, how they behave in different states and how to use them.

If working with tools like Bit you get this out of the box, as all your components are visualized in a design system made from your actual components. Developers can use and develop components from the same place designers, marketers and product managers can view and monitor the components.

If not, you can create your own documentation portal or leverage tools like Storybook to organize the visual documentation of the components you develop in a visual way. Either way, without making components visually discoverable it will be hard to achieve true reusability and collaboration over components.

Building your design system: top-down vs. bottom-up

There are two ways to build a component design system. Choosing the right one is mostly based on who your are and what you need to achieve.

Design first, then implement reusable components

The first, mostly used by larger organizations that need to standardize UX/UI and development across multiple teams and products, is to design components first and then make sure this design is implemented as components (often building a library) and used everywhere.

A super over-simplified structure of this workflow looks like this:

Build a visual language and design components
Implement components in a git-based project in GitHub/Gitlab etc
Distribute using component-platforms like Bit and/or to package managers
Standardize instances of components across projects and apps
Collaboratively monitor, update and evolve components (using Bit or other tools)

Code first, then collect components into a design system

The second, often used by smaller and younger teams or startups, is to build-first and then collect existing components from your apps into one system, align the design, and keep going from there. This approach saves the time consumed by the design-system project, time which startups often can’t afford to spend. bit-cli introduces the ability to virtually isolate components from existing repositories, building and exporting each of them individually as a standalone reusable unit, and collect them into one visual system made of your real code. So, you can probably use it to collect your components into a system in a few hours without having to refactor, split of configure anything, which is a quick way to do it today.

This workflow looks like this:

Isolate and collect components already existing in your apps into one collection (Bit is useful)
Bring in designers and other stakeholders to learn what you have and introduce your visual language into this collection
Update components across projects to align to your new collection
Use these components to build more products and apps
Collaboratively monitor, update and evolve components (using Bit or other tools)

Design systems and atomic design

Through the comparison of components and their composition to atoms, molecules, and organisms, we can think of the design of our UI as a composition of self-containing modules put together.

Atomic Design helps you create and maintain robust design systems, allowing you to roll out higher quality, more consistent UIs faster than ever before.

Learn more in this post: Atomic Design and UI Components: Theory to Practice.

Collaboratively manage and update components

Over time your design system is a living creature that changes as the environment does. Design might changes, and so should the components. Components might change to fit new products, and so should the design. So, you must think of this process as a 2-way collaborative workflow.

Controlling components changes across projects

When a component is used in 2 or more projects, sooner or later you will have to change it. So, you should be able to update a component from one project to another, consolidate code-changes and update all dependent components impacted by the change.

If you are using Bit this is fairly easy. You can import a component into any project, make changes, and update them as a new version. Since Bit “knows” exactly which other components depend on this component in different projects, you can update all of them at once and learn that nothing breaks before updating. Since Bit extends Git, you can merge the changes across projects just like you do in a single repository. All the changes will be visually availbe to view and monitor in your shared bit.dev component collection.

If not, things become trickier, and your component infrastructure team will have to enforce updates to their libraries for all projects using these libraries, which impairs flexibility, creates friction and makes it hard to achieve true standardization through adoption. Yet, this is harder but not impossible, here is how Walmart Labs do it. You will also have to make sure that both changes to code and design are aligned in both your design tools and library docs wikis, to avoid misunderstandings and mistakes.

Grow a component ecosystem in your organization

Building a design system is really about building a growing component ecosystem in your organization. This means that managing components isn’t a one-way street; you have to include the app-builders (component consumers) in this new economy, so that the components you build will actually use them in their applications and products.

Share components that people can easily find and use. Let them collaborate and make it easy and fun to do so. Don’t force developers to install heavy libraries or dive-in too deep into your library just to make a small pull-request. Don’t make it hard for designers to learn exactly which components changes over time and make it easy for them to collaborate in the process.

Your component design system is a living and breathing organism that grows and evolves over time. If you try to enforce it on your organization, it might die. Instead, prefer legalization and democratization of components, their development and their design. Regulate this process to achieve standardization, but don’t block or impair adoption- at all costs. Bit is probably the most prominent power-tool here too, but please do share more if you know them.

Conclusion

Design systems help to create consistency in the visual and functional experience you give you users, while forming your brand across different products and applications. Modern components, with or without a framework, let you implement this system as a living set of building blocks that can and should be shared across projects to standardize and speed development.

As designers and developers use different tools, it’s critical to bring them together over a single source of truth, which is really your code since this is what your users really experience. A democratized and collaborative process between developers, designers, products, marketers and everyone else is the only way to grow a thriving and sustainable component ecosystem that breathes life into your design system.

Modern tools built for this purpose, such as Bit and others (FramerX and BuilderX are also interesting on the designer’s end) can be used to build, distribute and collaborate over components to turn your design system into a consistent and positive user experience everywhere, and to manage and collaborate over components across teams within the organization.

Thanks for reading!

Do You Still Need a Component Library?

Joni Sar — Thu, 08 Aug 2019 12:46:48 +0000

Let's rethink the way we share components to build our applications.

Today, frontEnd components in React, Vue and Angular let us compose applications through modular UI building blocks. A couple of years from now, framework-agnostic web components will take this to the next level.

Yet, up until 2018 the way we shared and reused modular components wasn't very different than the way we shared entire projects. If we wanted to share a component from one reposiotry to another, we would have to create a new reposiotry to host it, move the code there, boilerplate it as a package, publish it, and install it as a dependency in the new project.

That process is very hard to scale when it comes to smaller atomic components. It wasn't meant for components, it was meant for projects.

So, teams began to struggle with sharing components, trying to reduce the overhead around the process. This often led to the creation of projects called "shared component libraries" (example) which are basically a single project with many components.

But, in 2018 a new kind of sharing became possible: sharing components directly between projects, synced through a remote cloud-based collection. This was made possible thanks to a new open-source project called Bit, built for sharing smaller modules between larger projects.

In this post, we'll try to explore the question "Do I still need a component library?" and present the pros of cons of different component-sharing workflows. Let's dive in.

Pros and cons of a component library

To better understand if a component library is the right choice, let's shortly review the pros and cons of building a component library. In short, the answer is: it depends :)

Pros of a component library

Instead of setting up 30 more repositories for 30 more components, you can just have 1 external repository to host all 30 components.
Consolidate the development of shared components into one project: PRs, Issues etc.
Assign a clear owner to the components.
Enforcement of stacks and standards (double-edged sword).

Basically, the main advantage of a component library depends on the perspective. Compared to a repo-per-component approach, it saves overhead and consolidates the development and consumption of components into one reposiotry and package. However, this can also be a downside. Let's review.

Pains of a component library

If the components are internal to your apps, it will require heavy refactoring to move them to the library.
Consumers just need a single component, yet they are forced to install a whole library. Lerna can help publish each component, but the overhead is heavy for many components.
How will you version and update individual components?
Discoverability for components is poor so you have to invest in docs-portals and maybe add tools like StoryBook or Codesandbox. Still, how can you search for a button component with X dependencies and only Y kb in bundle size? (see bit.dev below).
Component consumers can't make changes to the components without diving into the library and making a PR, then waiting for it to maybe get accepted. This often blocks the adoption of such libraries inside organizations. For many teams, this alone becomes a breaking point between the infra team building the library, and the app developers consuming it. Collaboration isn't good over the components.
You enforce styles and other things that don't fit any usecase for every consuming app, blocking the adoption of the library.
You make it hard to handle dependencies between components, as when you make a change to a component it's hard to tell which other components (in the library and otherwise) are affected and how.
You need to commit to additional tooling around the library to relife some of the pains (basic discoverability, individual publishing etc).

A component library can be compared to a music album CD-Rom (those of you over 25 will remember :). It's a static place you carry around with you, putting ~30 items on it. You have to read the cover to learn what's inside, and you can't search for songs. You also can't change the content without hard-burning the CD again. Over time, it takes some damage from ad-hock adjustments and starts to wear off. Collaboration across teams is very difficult with libraries, which often fail to get adopted at scale.

But, what if instead of a component CD album we can have a "component iTunes" - where we can easily share, discover, consume and update individual components from different projects? Keep reading.

Sharing components in the cloud

In 2018 an open-source project called Bit was first introduced on GitHub.

Unlike the project-oriented tools we use for our projects (Git repos, package managers etc), Bit was built for atomic components.

It lets us share JavaScript code between projects, without having to setup more external repositories to do so (however, we can if we want to, we can use it share code from a library to other projects too). It manages changes for both source-code and dependencies across projects.

bit.dev is Bit's component hub. Like GitHub, it's free for open-source too (and for some private code). Through bit.dev, components become available to discvoer, use and sync across projects and teams.

Let's quickly review.

Isolation and publishing

When it comes to frontEnd components, Bit lets us automatically isolate components from a project (app or library) and wrap them in a contained environment that lets them run in other projects, out of the box. This environment contains all the files of the component, all its dependencies and the configuration it needs to build and run outside of the projects.

This means we can individually share multiple components from a given project in little time, with zero to very little refactoring.

Bit handles each component's versions and dependencies while extending Git to track changes to its source code, across projects.

Discoverabilty for components

Through bit.dev the components you share become discoverable to yourself and others to find, learn about and choose from.

You can semantically search for components by name, and filter results based on context-relevant labels, dependencies, bundle size and more useful parameters.

You can quickly browse through components with visual snapshots, and when you go into a component's page you can try it-hands on in a live playground before using it in your project. You can also view the API docs, automatically parsed from the code, to learn how it works.

Through bit.dev components are visualized so that developers, product, designers and other stakeholders can collaborate and have universal access to all the components within the organization.

Component consmution and collaboration

Once you find a component you like, for example, shared by your team or the community, you can install it using package managers like npm and yarn.

Updating components right from the consuming project...

Bit also lets you bit import a component (or a whole collection) to a new project. This means Bit will bring the component's actual source-code into the repository, while tracking the changes you make.

You can then change something in the code, maybe a style for example, and tag the component with a new version. You can then share the new version back to the collection, and even pull the changes into any other reposiotry this component is written in, while leveraging Git to merge the changes between the versions.

Simply put, this means you can very quickly update a component right from your consuming app, so you don't have to dive into the library and wait on long PRs. While it requires some rules for collaboration (for example, choosing who can push new version into the collection in bit.dev), it also means people can adopt the components and fit them to their needs. Otherwise, the component might just not be used (or jusy copy-pasted and changes without anyone ever knowing about it :).

Component library + bit.dev together?

Given the advantages of both approaches, many choose to combine their component library with the advantages of Bit and bit.dev.

In this structure, the library functions as a development and staging area of the shared components. Bit and bit.dev are used to share the components, make them discoverable, and enable collaboration on top of the components to breed their adoption in the real world.

The best choice depends on your needs. For larger organizations with infra teams publishing components while other teams are consuming them, it's recommended to combine both- to develop all components owned by the infra team in their repo, and make all of them individually discoverable to find, use and -given simple regulation- update as needed.

For smaller teams of single developers trying to share a component between a couple of applications, a library might be an overkill and you can gust share components through your bit.dev collection- from one application to another, and keep them synced. You won't even need to refactor anything or add additional repositories to maintain.

Bottom line, it's really up to you :)

Cheers

When Writing Code Meets The Marshmallow Test

Joni Sar — Mon, 29 May 2017 14:40:50 +0000

One of my favorite videos around the web is this video of kids trying to resist eating a marshmallow after being promised a greater reward if they can hold on for a short while:

This isn't only a funny video. It's also a fascinating experiment in human psychology that demonstrates one of our most important cognitive biasses: time preference.

What is time preference

Time preference basically means we value short-term rewards more than we value long-term ones. Practically speaking, it means we would rather have a single marshmallow right now than a whole bunch of them later. The longer the time difference, the worse it gets.

As time preference is affected by time (shockingly), studies suggest that people who lived longer may suffer less from its manipulations. To help battle this bias without waiting to become wise elders, here are a few tips as to how to identify and overcome this bias today.

Time preference and writing code

Like many other tasks, writing code often means choosing between short-term and long-term values. Time preference might tempt us to choose the immediate satisfaction over long-term values. Enforcing better practices upon ourselves often require the activation of precious and limited mental resources such as willpower.

Here are a few suggestions for how and when we can be aware of this bias when writing code, and how we can try to overcome it.

Test-driven development

We all know about TDD. We all know how important unit test are. There is little doubt about how they help make sure nothing breaks when we change stuff, and how they affect the overall quality and maintainability of our applications.

But, "Interrupting" our development flow to write tests in short cycles isn't always psychologically simple. Sometimes we just want to get the job done and worry about long-term stuff later. We can adopt different methodologies, but the curve for adoption often comes with a bit of a struggle.

There are a few ways we can look at TDD to increase short-term value and satisfaction, balancing the equation a little more in our favor.

A good example is the fact that tests help to decide when something is good enough. They define the behavioral scope of different components, helping us to better understand what each of them is supposed to do and when it's good enough to be considered "working". This can actually help us save time and stop optimizing things which already hit home.

Also, don't be ashamed to take pride in work ethic and quality of practice. Seeing green indicators flash over our code and knowing we put in the effort to create something we're proud of isn't something to be taken lightly. Practice makes perfect, and good practice should make us proud.

Design for modularity and reusability

Building modular software out of smaller atomic functionalities offers many advantages. Designing our applications with these principles in mind makes for better and more maintainable software. Still, much like TDD, it might also require some additional thinking and effort right now.

To help make life easier, we can try and generate short-term values and satisfaction from this practice. Designing with modularity in mind helps to better understand how our application is built and how every component fits in the bigger picture. Such clear structure can help avoid writing stuff we don't need and helps get things ready for production quicker. Thick twice, write once.

Also, we can and should aim to make our modular components truly reusable as we work, not only by design.

Some add components to a general "util" library they drag across projects. Others keep a "waiting for export" directory ready for exporting a reusable component every time we create it. Publishing to package managers such as NPM every single component might consume much of our day (boiler-plating etc.), thus creating more immediate negative value in the equation. To lower the barrier we can also use projects such as Bit and take advantage of the simplicity of export to create a growing arsenal of reusbale components to be shared with the open source community.

Building an arsenal of open source work is fun and generates a clear visual feedback for our effort. It's also a great way to collaborate with the others while getting feedbacks and improvement suggestions for our work.
Giving to others is something to be proud of, and social metrics or downloads makes us (biologically) feel the rush of what we did.

In the long run, your code will also be easier to maintain and understand.

Short documentation cycle

We often think of documentation if the context of explaining our code to the next person who will have to look at it. We know how important that is, and we've all tried to dive into someone else's code wishing they'd taken the extra time to write useful documentation.

Taking the time to document every component isn't always simple. Future us or future others aren't always our first priority, and we get tempted to overlook it and put our mental resources into getting our code to work.

There are a few ways to make this process more practical and satisfying in the short term. First, we can create a clear format for documenting different components and modules. Deciding what will be the exact format for the documentation can help grasp what we're going to do and how long it will take us, lowering the mental barriers of starting to write it. For example, when documenting a Javascript core functionality we can work with a checklist of (a) short description (b) signature (arguments, returns) (c) 1-3 examples and so on. This makes it easier to repeat the process.

We can also make sure our documentation builds a logical story. If we know that functionality A leads to B which together work with C, we can "read" the story of our code and make sure everything makes sense and that we didn't add redundant chapters. Of course, modularity means every component will be independently documented. Still, chapters in a story should build upon one another in a logical way as much as possible. If they don't, the docs is a good way to find out.

Good docs also help when publishing our work to our team or the open source, playing well with modularity and reusability.

Unit tests can also work as part of the component's documentation. For example, if we have a simple array-first Javascript function, the tests can tell us that (a) when the array is empty it will return null (B) the first value of the array is returned when given X. This way the tests can also function as usage examples that help better understand the different use cases our code handles. This way we can hit two marshmallows with one bite.

At the end of the day, these are only suggestions. It's really up to us to know when and how ever we can gain improve immediate satisfaction from practices that also create long-term value. Over time, we develop our own understanding of "what works for us" and there isn't one rule that applies to everyone. In many ways, that's part of the beauty of it all.

"I can resist everything except temptation"
- Oscar Wilde

Coding In The Age Of Code Components

Joni Sar — Mon, 01 May 2017 09:18:00 +0000

This is the age of code components. Web, React, Angular, Vue, and even Node components are the building blocks of pretty much everything these days.

This makes sense. Software should be built by composing smaller, isolated functionalities together. Modularity and reusability are key for composability. When designing software, we should be designing a composition of smaller functionalities. When I say small, I don't really mean X or Y lines of code. What I do mean is small in the sense that it handles a single focus or responsibility.

On Reusability And Modularity

More and more, particularly when it comes to web components, it seems like designing isolated and reusable components is not only simpler than it used to be, it's sometimes the only right way to design them.

So, how come achieving true reusability for code components remains such a challenge? the answer lays not in design, but rather in the question how do we create, find and use these components.

Micro Packages Are Not The Answer

Obviously, we don't want to be copy-pasting components everywhere. Duplications are very bad, and there is no need to elaborate. The problem is, up until not the only alternative to duplicating code was publishing these components as packages, or "micro-packages".

I don't think small components should become packages. Packages are not fit for making them practically reusable, and they add too much complexity. Here is why:

Initial overhead: to create a new repository and package for every small component you would have to create a repository, the package boilerplate (build, testing, etc.) and somehow make this process practical for a large set of components.
Maintenance: modifying a repository and a package takes time and forces you to go through multiple steps such as cloning, linking, debugging, committing, republishing and so on. Build and install times quickly increase and dependency hell always feels near.
Discoverability: it’s hard if not impossible to organize and search multiple repositories and packages to quickly find the components you need. People often used different terms to describe the same functionality, and there is no single source of truth to search and trust.

Making reusability practical

I'll start at the end- we built an open source project to solve this problem. It's called Bit and it enables us to quickly create reusable components during your workflow, export them to a distributed repository designed for code components called a Scope (which stores, organizes, manages, tests and builds your components) and then use them anywhere across repositories and applications. Components can be used as a virtual API pulling nothing but the code actually used in your application.

Bit was created to solve the three problems described above. To do so, it had to give components everything packages couldn't. It had to make them quick to create, simple to maintain and easy to find. Designed for code components, Bit introduces some new capabilities to make this not only possible but also practical:

Bit comes with a reusable and isolated component environment. This environment is also configurable, saving the overhead for creating new boilerplates for new components (which can be done directly withing any project you're working on). This environment also takes care of testing and building your components, using any framework you choose.

To organize, store and manage your components Bit uses "Scopes". A Bit Scope is a distributed and virtual layer of components on top of a source-code repository (and outside of it). You can export components to a Scope, where they will be stored, organized and made reusable. Scopes are distributed and can be used for organizing components by context, collaborators or other abstract methods. Scopes makes it simpler to maintain a large collection of reusable components, keeping them all in one place while modifying and using them individually.

Bit also solves the discoverability problem using the Scoping organizational system, and features such as a built-in semantic search engine.

Distribution and centralization

Distribution has many advantages, both practical as well as keeping things separated from commercial interests. However, centralization has its own advantages particularly in the areas of collaboration and the comfort of work. To help make work easier and collaborate as a team, we also built a free community Hub called bitsrc. It's free for open source and always will be. Here is an example Scope of utility functions I made with my team.

What now?

Bit is working, but it's also a work in progress.
For example, Bit is designed to be language agnostic and uses external drivers to work with different languages. Javascript is the first one we added, and more should soon follow.

Other features should also be added, such as automatic dependency definition, source code indexing, component quality measurement, semi-automatic semantic versioning and more.

For now, working with Bit and/or bitsrc allows us to create, maintain and reuse a growing set of "building blocks" including Web and React components, utility functions, small modules and more. This not only speeds up work and prevents duplications, it aligns with the basic principles of how software should be composed. One step at a time.

Feel free to try it out for yourselves, contributions on GitHub are always welcome.