DEV Community

Amanda Gerdes
Amanda Gerdes

Posted on • Originally published at dataengineeringguide.substack.com

StarRocks Is Not Enterprise Ready

TL;DR

StarRocks has critical limitations in memory stability, data correctness, real-time ingestion, and enterprise security that make it unsuitable for most production enterprise workloads in 2026.

  • I analyzed 50+ GitHub issues, 15+ community forum threads, 6+ practitioner Medium articles, and the official documentation to assess StarRocks' enterprise readiness.
  • Memory management is the #1 operational pain point. BE nodes crash with OOM under load, memory tracking is wildly inaccurate (reporting 65GB used when actual is 1.5GB), and memory leaks span 30+ GitHub issues across 5+ years. This isn't a bug. It's a pattern.
  • Three independent silent data correctness bugs produce wrong results without any error or warning: Iceberg cache serves permanently stale data, Arrow Flight silently drops rows, and RAND() pushdown returns zero rows on valid queries.
  • The Cost-Based Optimizer selects plans that cause 10x+ query regressions when statistics are stale. StarRocks' own 2025 roadmap acknowledges this problem.
  • Real-time ingestion has multiple total-halt failure modes. StreamLoad randomly freezes cluster-wide, Flink sinks throw 404s, and Kafka Connect jobs get stuck for minutes.
  • Production deployment requires 3 FE + 3 BE nodes minimum, NTP synchronization, load balancer setup, and deep OLAP expertise. Kubernetes deployments carry 22% CPU overhead.
  • Shared-data mode doesn't support BACKUP/RESTORE, has no migration path from shared-nothing, and suffers unpredictable cold-data latency.
  • Enterprise security features are gated behind CelerData (the commercial offering), but even CelerData leaves 42% of identified gaps unresolved, including no Kerberos, no PCI-DSS, no data governance, and no true multitenancy.
  • Verdict: StarRocks is a fast analytical engine. For enterprise workloads with SLAs, compliance requirements, and paying customers depending on correct data, the operational and correctness risks are too high.

How I Analyzed StarRocks' Enterprise Readiness

I pulled and cross-referenced evidence from every public source I could find. That includes 50+ GitHub issues (including open bugs from early 2026), 15+ StarRocks community forum threads, 6+ practitioner articles on Medium from teams running StarRocks in production, official StarRocks and CelerData documentation, and comparison analyses from third-party sources.

Each limitation was classified by category, severity, frequency, and whether it remains current or has been resolved. The structured results were then grouped to identify systemic patterns versus one-off incidents.

This isn't a benchmarking exercise. Benchmarks measure what StarRocks does well, like sub-second analytical queries on columnar data. This analysis measures what happens when you actually run it in production with real teams, real data, and real SLAs.


What Running StarRocks in Production Actually Looks Like

StarRocks benchmarks are impressive. The vectorized execution engine, CBO optimizer, and columnar storage deliver fast analytical queries. When I first ran a test workload, the numbers were compelling.

Then production happened.

A complex join query that ran in 3 seconds started taking 30+ seconds after a statistics refresh. A StreamLoad ingestion job froze with no error, halting all new data into the cluster until we manually restarted the FE leader.

Then we discovered that our Iceberg queries had been silently returning wrong results for days. The cache was serving stale data, and no error or warning had fired.

I started digging into GitHub issues and community forums. The problems we hit weren't unique. They were the majority experience for teams pushing StarRocks beyond toy workloads.


What StarRocks Gets Right

I want to be fair here, because context matters:

  • Raw query performance: On well-structured analytical workloads, StarRocks is fast. Sub-second aggregations on billions of rows are achievable with proper tuning.
  • Vectorized execution engine: The C++ backend with SIMD optimizations delivers real throughput advantages over JVM-based alternatives.
  • Unified batch and real-time: The architecture supports both batch loading and real-time streaming into the same tables. When it works, this is a meaningful simplification.
  • Active development: The project ships regular releases, the roadmap is public, and the team is responsive on GitHub. StarRocks 4.0 introduced spill-to-disk and global shuffle, addressing some historical pain points.
  • Cost: For teams that can operate it, self-hosted StarRocks is significantly cheaper than Snowflake or BigQuery.

If your workload is analytical queries on structured data, your team has OLAP expertise, and you can tolerate the operational overhead, StarRocks can deliver real value.

But enterprise readiness gets judged by what happens when queries return wrong data silently, when ingestion halts at 2 a.m., and when your compliance team asks for audit trails that don't exist.


The StarRocks Enterprise Readiness Checklist

# Enterprise Requirement Score Evidence Volume Primary Failure Mode
1 Does the system stay up under production memory pressure? 🔴 Often No 30+ GitHub issues BE nodes OOM under complex queries, compaction, and ingestion, spanning 5+ years of reports
2 Can you trust that query results are correct? 🔴 No, silently wrong 3 independent vectors Iceberg cache serves stale data. Arrow Flight drops rows. RAND() pushdown returns empty sets. All without errors
3 Is real-time data ingestion reliable? 🔴 Frequently No 8+ issues StreamLoad freezes cluster-wide. Flink 404s in shared-data. Kafka Connect stuck in PREPARED state for minutes
4 Are query response times predictable and stable? 🔴 Often No Acknowledged on roadmap CBO selects 10x slower plans when statistics are stale. 15-second CPU oscillation cycles under load
5 Can a small team deploy and operate it? 🔴 No High (forums + articles) 3 FE + 3 BE minimum. NTP sync. Load balancer. Resource pool tuning. 22% CPU overhead on K8s
6 Does the SQL dialect support standard enterprise patterns? 🟡 Partially 6+ gaps No MERGE INTO. LATERAL JOIN limited to unnest(). No prepared statements. JSON unusable in GROUP BY/ORDER BY
7 Can you upgrade without fear? 🔴 Often No Multiple issues SIGSEGV crashes post-upgrade. Forbidden downgrade paths. Consecutive-only upgrades. JDK 17+ required from v3.5.0
8 Does shared-data mode work for production? 🔴 No for DR Multiple gaps No BACKUP/RESTORE. No migration from shared-nothing. Cold-data latency spikes. Homogeneous CN nodes required
9 Are materialized views reliable at scale? 🟡 Fragile Multiple open issues MV failures under memory pressure. Activation failures post-upgrade. Projection loss during query rewrite
10 Does the platform meet enterprise security requirements? 🔴 No (open-source) Extensive gaps No TDE. No Kerberos. No native RLS. No column masking. Ranger can't enforce on views or materialized views
11 Can you prove compliance to auditors? 🔴 No (open-source) Extensive gaps No SOC2. No HIPAA BAA. No PCI-DSS. Audit logs lack session IDs and app-level attribution
12 Does the platform support true multitenancy? 🔴 No Architectural No data isolation between tenants. No storage quotas. No cross-tenant network isolation

1. Does StarRocks Crash with OOM in Production?

Score: 🔴 Often No | 30+ GitHub issues spanning 5+ years

StarRocks BE nodes crash with OOM under production workloads. 30+ GitHub issues spanning 5+ years document this as the single most reported problem category, and it's not a single fixable bug. It's a systemic architectural pattern where memory-intensive operations lack proper bounds.

  • OOM under query load: BE nodes crash with "Memory of process exceed limit" during complex queries and ingestion. One user reported OOM crashes on a 1.5TB system running production-scale loads.

  • Memory tracking is unreliable: MemTracker reports dramatically higher usage than actual, 65GB reported vs 1.5GB actual, due to jemalloc retained memory accumulation. You can't do capacity planning when your memory numbers are wrong by 40x.

  • Compaction death spirals: Compaction tasks can exceed memory limits and enter infinite retry loops where rowset counts keep increasing and the system can't recover. This can take down an entire cluster.

  • Iceberg metadata OOM: FE runs out of memory caching metadata for Iceberg tables. A separate issue documents unbounded caching of Iceberg Table objects, still open as of early 2026.

  • Memory leaks in long-running clusters: Objects never removed from CallbackFactory during insert load jobs cause steadily increasing memory over time. This has been reported across 8+ different components: query execution, compaction, Iceberg caching, JVM statistics, load jobs, Ranger integration, spillable joins, and AWS SDK.

StarRocks 4.0 introduced spill-to-disk, which helps with some query memory scenarios. But the system's memory accounting is inaccurate and its memory-intensive operations lack proper bounds. That's architectural.


2. Does StarRocks Return Wrong Query Results?

Score: 🔴 No, silently wrong | 3 independent vectors

StarRocks has three independent bugs that return wrong query results without any error or warning. These affect Iceberg cache, Arrow Flight, and RAND() function pushdown, making this the most dangerous category in the assessment.

  • Iceberg cache serves permanently stale data: Under memory pressure, the dataFileCache evicts empty placeholders, causing queries to scan a partial file set silently. A query returns 1.7 billion when the correct answer is much larger. No error. No warning. The data is just wrong.

  • Arrow Flight silently drops rows: When query_mem_limit is set, Arrow Flight silently drops rows from results without any error or audit trail. This makes Arrow Flight unusable for any production workload that requires complete result sets.

  • Non-deterministic function pushdown returns empty sets: The optimizer incorrectly pushes down non-deterministic functions like RAND(), causing sampled queries to silently return zero rows when they should return results.

These are open GitHub issues with reproduction steps. They share a common trait: the system doesn't tell you something went wrong. Standard testing may not catch them. You discover the problem when a downstream report looks "off" and someone decides to dig.


3. Does StarRocks Real-Time Ingestion Fail?

Score: 🔴 Frequently No | 8+ issues across multiple ingestion paths

StarRocks real-time ingestion fails across all major paths. StreamLoad freezes cluster-wide, Flink sinks return 404s, and Kafka Connect jobs stall for 1-5 minutes. Every major ingestion method has documented failure modes that cause partial or complete data ingest stoppage.

  • StreamLoad freezes cluster-wide: Normal StreamLoad operations randomly freeze, halting all new data ingestion until the FE leader is manually restarted. There's no automated recovery. Your entire cluster stops ingesting data until a human intervenes.

  • Flink connector 404s in shared-data: Flink connector sinks fail with HTTP 404 upon pipeline restart in shared-data mode, breaking exactly-once semantics.

  • Kafka Connect stuck transactions: Kafka Connect jobs get stuck in PREPARED or BEFORE_LOAD states, causing 1-5 minute performance lag. For real-time SLAs, that's a failure.

  • INSERT degrades at scale: INSERT performance drops from ~1,666 to ~366 rows/sec as tables grow toward 100M records. StarRocks' own documentation acknowledges INSERT isn't intended for production volume ingestion, but many teams discover this too late.

  • Small-file fragmentation: High-concurrency ingestion creates 50K-100K small segment files that never fully compact, degrading subsequent query performance. StarRocks 4.0's global shuffle improves this but doesn't eliminate it.

If your use case is batch-oriented (daily/weekly loads), these risks are manageable. If you need minute-level data freshness with SLAs, you're building on a fragile foundation.


4. Are StarRocks Query Times Predictable?

Score: 🔴 Often No | Acknowledged on StarRocks' own roadmap

StarRocks queries can regress by 10x or more when CBO statistics go stale. StarRocks' own 2025 roadmap acknowledges optimizer instability as a priority fix.

  • CBO plan instability: The Cost-Based Optimizer selects wildly inefficient plans when statistics are stale or inaccurate. A query that normally runs in 3 seconds can take 30+ seconds. StarRocks' own 2025 roadmap lists "enhance query plan generator robustness" as a priority. When the vendor's roadmap calls out the optimizer as unstable, believe them.

  • CPU oscillation cycles: Under high CPU load, query performance oscillates in 15-second cycles. Starts fast, slows down, halts, then speeds up again.

  • FE planning blocks indefinitely: Under high concurrency, the FE becomes a bottleneck during query planning when scanning large HDFS/S3 directories. Queries stall without timeout. No timeout enforcement exists.

Production SLAs require predictable latency. A fast average with occasional 10x blowups doesn't meet that bar.


5. How Hard Is It to Deploy and Operate StarRocks?

Score: 🔴 No | Consistently reported across forums and articles

StarRocks requires a minimum of 3 FE + 3 BE nodes, NTP synchronization, an external load balancer, and resource pool tuning for production. Kubernetes deployments add 22% CPU overhead. It's operationally demanding by design.

  • Minimum topology: Production requires 3 Frontend (FE) + 3 Backend (BE) nodes, NTP clock synchronization within 5 seconds, an external load balancer, and resource pool tuning. This is documented as the minimum.

  • Steep learning curve: Teams without prior OLAP experience consistently report significant friction. One Medium author described the first two weeks of setup as "a struggle."

  • Kubernetes overhead: K8s deployments using the kube-starrocks operator show 22% CPU overhead compared to 1% on physical machines. FE crashes have been reported with specific K8s versions.

  • Manual recovery procedures: Tablets stuck in VERSION_INCOMPLETE state require manual DBA intervention. Automatic clone fails, and multiple blocked tablets can halt all writes.

Distributed OLAP systems are inherently complex. But teams evaluating StarRocks need to budget for dedicated data platform engineers. This isn't something you add to a generalist's plate.


6. What SQL Compatibility Gaps Does StarRocks Have?

Score: 🟡 Partially | 6+ material gaps

StarRocks doesn't support MERGE INTO, limits LATERAL JOIN to unnest() only, and lacks prepared statement support. Migrating from Snowflake, BigQuery, or PostgreSQL requires 4-6 weeks of SQL rewrites.

  • No MERGE INTO: The standard ANSI SQL MERGE INTO statement is not supported. ETL upsert logic requires verbose INSERT OVERWRITE + FULL JOIN workarounds.

  • LATERAL JOIN limited to unnest(): Can't use subqueries in LATERAL JOIN, breaking Top-N-per-group and other standard patterns.

  • No prepared statements: Doesn't support MySQL COM_STMT_PREPARE, breaking some JDBC drivers and BI tool integrations.

  • JSON columns unusable for analytics: JSON can't be used in ORDER BY, GROUP BY, or JOIN without CAST. Aggregations on JSON columns are described as "virtually unusable in a real-time sense."

  • Documentation contradicts behavior: LAG() docs claim ARRAY support from v3.5, but executing on v3.5.10 throws an exception. When documentation doesn't match implementation, trust erodes fast.

Plan 4-6 weeks of SQL porting and testing for any migration. Don't underestimate this.


7. Is It Safe to Upgrade StarRocks in Production?

Score: 🔴 Often No | Multiple documented failure modes

StarRocks version upgrades have caused SIGSEGV BE crashes, and downgrading from v4.0 to v3.5 is explicitly forbidden due to metadata incompatibility. Skipping versions is also discouraged, forcing a rigid consecutive upgrade path.

  • SIGSEGV crashes post-upgrade: Version-specific BE crashes with SIGSEGV after upgrading (e.g., 3.3.14 to 3.4.7). Unplanned cluster downtime.

  • Forbidden downgrade paths: v4.0 to v3.5.0/v3.5.1 is explicitly forbidden due to metadata incompatibility. If the new version has problems, you can't roll back.

  • Consecutive-only upgrades: Skipping versions is discouraged. This forces a rigid upgrade cadence. Miss one, and you're doing two sequential upgrades to catch up.

  • JDK 17+ now required: From v3.5.0+, older Java runtimes are dropped. Deployments on older JDK versions can't upgrade without infrastructure changes first.

Every upgrade becomes a maintenance window. Budget 8-12 hours including rollback time. Test extensively in staging.


8. Is StarRocks Shared-Data Mode Production Ready?

Score: 🔴 No for disaster recovery | Multiple architectural gaps

StarRocks shared-data mode doesn't support BACKUP/RESTORE, has no migration path from shared-nothing, and requires full data re-ingestion to switch architectures. For disaster recovery, it's effectively unusable.

  • No BACKUP/RESTORE: Shared-data clusters don't support backup and restore operations. Disaster recovery is effectively impossible at the platform level.

  • No migration path from shared-nothing: You must re-ingest all data into a new cluster to switch architectures. There's no in-place transformation.

  • Cold-data latency spikes: Queries hitting data not in local cache incur unpredictable object storage latency.

  • Homogeneous node requirement: All compute nodes must have identical specs. You can't mix node types for cost optimization.

  • Single CN shutdown cascades: A single CN node going down blocks queries across all healthy nodes due to lock contention, producing "Deadline Exceeded" errors cluster-wide.


9. Do StarRocks Materialized Views Work at Scale?

Score: 🟡 Fragile | Multiple open issues

StarRocks materialized views fail under memory pressure during large aggregation/join operations, break after version upgrades, and suffer projection loss bugs during query rewrite. Multiple GitHub issues remain open.

  • Async materialized views fail under memory pressure during large aggregation/join operations.
  • Activation failures occur after version upgrades.
  • Projection loss bugs during query rewrite cause the optimizer to ignore available MVs.

Multiple GitHub issues remain open without clear resolution. If you depend on materialized views for customer-facing dashboards, test failure scenarios extensively before committing.


10. Does StarRocks Meet Enterprise Security Requirements?

Score: 🔴 No (open-source) | Extensive gaps

Open-source StarRocks lacks TDE, Kerberos support, native row-level security, and native column masking. Ranger, the primary authorization option, can't enforce policies on views or materialized views.

CelerData (the commercial offering) adds TDE, native RLS, and column masking. But Kerberos remains unsupported, and the Ranger limitations persist for customers who use it.


11. Does StarRocks Have SOC2, HIPAA, or PCI-DSS Compliance?

Score: 🔴 No (open-source) | Significant gaps

Open-source StarRocks has no SOC2 Type II, no HIPAA BAA, and no PCI-DSS certification. Audit logs lack session IDs and application-level attribution, and are dispersed across FE nodes with no built-in consolidation.

  • No SOC2 Type II: Open-source StarRocks has no certification. CelerData achieved SOC2 in October 2023.
  • No HIPAA Business Associate Agreement: Technical capabilities exist, but no formal BAA is documented.
  • No PCI-DSS certification: Neither open-source nor CelerData.
  • Audit logging gaps: Logs lack session IDs, application-level attribution, and are dispersed across multiple FE nodes with no built-in consolidation.
  • CVE tracking is sparse: No dedicated security advisory page. Security issues are disclosed only via GitHub.

For regulated workloads, the compliance posture isn't ready for production audit.


12. Does StarRocks Support Multitenancy?

Score: 🔴 No | Architectural limitation

StarRocks provides workload isolation via resource groups but has no storage-layer data isolation, no per-tenant storage quotas, and no cross-tenant network segmentation. Tenant A's data is accessible to Tenant B via SQL.

For SaaS platforms serving multiple customers from a shared StarRocks cluster, this is a non-starter.


Does CelerData Fix StarRocks' Enterprise Gaps?

CelerData is the commercial offering from the company behind StarRocks. Fair question: does paying for CelerData fix these problems?

What CelerData definitively solves: TDE encryption, native RLS/masking, SOC2 Type II, KMS integration, managed deployment (Cloud BYOC), and geo-replication via Failover Groups. These are meaningful additions.

What CelerData does not solve:

Gap Status with CelerData
Memory OOM crashes Improved (spill-to-disk) but not eliminated
Silent data correctness bugs ❌ Same engine. Iceberg cache, Arrow Flight, RAND() all persist
StreamLoad freezes ❌ Same engine
Query plan instability Improved tooling, not fundamentally resolved
Kerberos authentication ❌ Not supported
PCI-DSS certification ❌ Not certified
True multitenancy / data isolation ❌ Workload isolation only
Data governance (lineage, catalog, classification) ❌ Entirely absent
User/privilege metadata backup ❌ Explicitly not supported
Shared-data BACKUP/RESTORE ❌ Still not supported
Published RPO/RTO targets ❌ Not publicly available

Of 48 identified enterprise gaps, CelerData fully resolves 13 (27%), improves 15 (31%), and leaves 20 (42%) unresolved.

Most of the correctness and ingestion bugs are engine-level issues that affect CelerData equally, since CelerData runs the same StarRocks engine. CelerData's value is in operational management and security features. It doesn't insulate you from core engine bugs.


StarRocks Limitations Summary

Category Evidence Volume Key Risk
Memory Management 30+ GitHub issues (5+ years) OOM crashes, inaccurate tracking, compaction death spirals
Data Correctness 3 independent silent vectors Wrong results with no errors
Data Ingestion 8+ issues across all paths Total ingestion halt scenarios (StreamLoad, Flink, Kafka)
Query Performance Acknowledged on roadmap 10x regressions from stale statistics
Operations & Deployment Consistently reported Requires dedicated OLAP expertise. K8s 22% overhead
Enterprise Security Extensive gaps (open-source) No TDE, no Kerberos, no native RLS, broken Ranger
Compliance No certifications (open-source) No SOC2, no HIPAA BAA, no PCI-DSS
SQL Compatibility 6+ material gaps Migration from Snowflake/BigQuery requires full SQL rewrite
Shared-Data Mode Multiple architectural gaps No backup, no migration path, cold-data latency
Ecosystem Maturity Early Fewer integrations, limited community, 2 G2 reviews

These aren't edge cases filed by users pushing the system in unreasonable ways. They're the documented experience of teams running StarRocks in production, corroborated across GitHub, Medium, forums, and Reddit.


So… Is StarRocks Enterprise Ready?

Use Case Verdict Reasoning
Analytical POCs and benchmarks ✅ Yes Raw query performance is impressive
Internal analytics (low SLA) ✅ Yes Acceptable if your team has OLAP expertise and can tolerate OOM events
Dev / staging environments ✅ Yes Good performance at low cost. Failures here don't affect customers
Batch analytics with dedicated platform team 🟡 Conditional Viable with significant operational investment and thorough testing
Customer-facing dashboards with SLAs ❌ No Query plan instability and memory crashes break latency guarantees
Real-time analytics pipelines ❌ No StreamLoad freezes, Flink 404s, and Kafka stuck transactions are total-halt scenarios
Workloads requiring data correctness guarantees ❌ No Three independent silent correctness bugs with no warnings
Regulated or compliance-sensitive workloads ❌ No No TDE, no SOC2, no HIPAA BAA, no PCI-DSS (open-source). CelerData partially addresses this
Multi-tenant SaaS platforms ❌ No No data isolation, no storage quotas, no tenant-level security boundaries
Teams without dedicated data platform engineers ❌ No Operational complexity is prohibitive for small or generalist teams

When Should You Use StarRocks?

Run the checklist above against your specific requirements.

If you need fast analytical queries on structured data and your team can operate distributed OLAP, StarRocks can work. Go in with eyes open.

If you need enterprise security, compliance, or governance, evaluate CelerData, but understand that 42% of gaps persist. Compare against Snowflake, BigQuery, or Databricks, which have mature enterprise feature sets.

If you need production SLAs on query latency or data correctness, the evidence doesn't support StarRocks today. The silent correctness failures alone should give any data team pause.

Most teams that start with StarRocks for production end up maintaining two systems anyway, one for the workloads StarRocks can handle and another for everything it can't. At that point, the cost savings that justified the decision disappear.

What's your experience? If you've run StarRocks in production, I'd like to hear how it went. Drop a comment.

Production readiness isn't about how fast a query runs on day one. It's about whether you can trust the answer on day 100, and whether the system tells you when something goes wrong.

Choose accordingly.


FAQ

Is StarRocks production ready in 2026?

No. StarRocks has silent data correctness bugs, memory instability, and ingestion failures that make it unsuitable for most enterprise production workloads in 2026. For internal analytics with dedicated platform teams and lower SLA requirements, it can work with significant operational investment.

What is the most dangerous StarRocks failure mode?

Silent data correctness failures. Three independent bugs produce wrong results without any error or warning: Iceberg dataFileCache serving stale data (GH #70522), Arrow Flight silently dropping rows (GH #65089), and RAND() pushdown returning empty sets (GH #66275). These are more dangerous than crashes or performance problems because they're invisible.

Can StarRocks cause data loss during ingestion?

Yes. StreamLoad operations can randomly freeze (GH #56343), halting all new data ingestion cluster-wide until the FE leader is manually restarted. Flink sinks fail with 404 errors in shared-data mode (GH #33368), breaking exactly-once semantics. INSERT performance degrades from ~1,666 to ~366 rows/sec as tables grow, making it unsuitable for production volume ingestion.

Why does StarRocks crash with OOM?

StarRocks' memory tracking reports usage inaccurately, sometimes showing 65GB when actual consumption is 1.5GB (GH #67607). The system can't enforce memory limits when its own accounting is wrong by 40x. Combined with memory leaks across 8+ components and compaction tasks that enter infinite retry loops, OOM crashes are the single most reported issue category, with 30+ GitHub issues spanning 5+ years.

Does StarRocks support standard SQL for migration from Snowflake or BigQuery?

Not fully. StarRocks lacks MERGE INTO (GH #65949), limits LATERAL JOIN to unnest() only (GH #63727), restricts QUALIFY to three window functions, and doesn't support prepared statements (GH #38672). Migration requires 4-6 weeks of comprehensive SQL rewrites, not a simple dialect change.

Is StarRocks' shared-data mode production ready?

Not for workloads requiring disaster recovery. Shared-data clusters don't support BACKUP/RESTORE, have no migration path from shared-nothing (requires full data re-ingestion), suffer unpredictable cold-data latency, require homogeneous CN node specs, and a single CN shutdown can cascade failures across all healthy nodes (GH #66429).

Is CelerData better than open-source StarRocks?

Partially. CelerData resolves 27% of identified enterprise gaps, primarily in security and managed deployment. It adds TDE, native RLS/masking, SOC2 Type II, managed deployment, and geo-replication. The remaining 42%, including data correctness bugs, multitenancy, and governance, persist regardless of whether you pay for CelerData.

Is StarRocks good for internal analytics and POCs?

Yes. Raw query performance is impressive, and for teams with OLAP expertise and lower SLA requirements, StarRocks delivers real value at a fraction of the cost of managed cloud warehouses. The caveats in this analysis apply to enterprise production workloads with SLAs, compliance, and correctness requirements.


Analysis based on public GitHub issues, community forums, practitioner articles, and official documentation as of March 2026. Individual experiences vary. Always test critical paths, especially data correctness, in your own workload before committing to production.

Top comments (0)