<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Whitney </title>
    <description>The latest articles on DEV Community by Whitney  (@wtrue).</description>
    <link>https://dev.to/wtrue</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1228535%2F98fa3749-c545-4d9b-bae1-e1c01ace0a9b.png</url>
      <title>DEV Community: Whitney </title>
      <link>https://dev.to/wtrue</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/wtrue"/>
    <language>en</language>
    <item>
      <title>Lessons from Log4Shell: Building a CRA-Ready Log4j</title>
      <dc:creator>Whitney </dc:creator>
      <pubDate>Wed, 06 May 2026 22:01:56 +0000</pubDate>
      <link>https://dev.to/theasf/lessons-from-log4shell-building-a-cra-ready-log4j-j43</link>
      <guid>https://dev.to/theasf/lessons-from-log4shell-building-a-cra-ready-log4j-j43</guid>
      <description>&lt;p&gt;&lt;em&gt;By: Piotr P. Karwasz, VP Logging, Apache Software Foundation&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The disclosure of Log4Shell (CVE-2021-44228) on December 9, 2021 did not just expose a vulnerability: it exposed a way of building software that was no longer fit for purpose, and it helped bring the European Cyber Resilience Act into being.&lt;/p&gt;

&lt;p&gt;I recently hosted a session for the Open Regulatory Compliance community’s CRA Monday series to tell the story from the inside: what the Apache Logging team actually did in the years after Log4Shell to rebuild the project as something CRA-ready.&lt;/p&gt;

&lt;p&gt;This blog recaps and expands upon that session; you can also &lt;a href="https://www.youtube.com/watch?v=ns9RBhEsz_U" rel="noopener noreferrer"&gt;watch the recording&lt;/a&gt; or &lt;a href="https://github.com/orcwg/orcwg/tree/main/events/cra-mondays" rel="noopener noreferrer"&gt;view the slides&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Wake-Up Call for the Software Ecosystem
&lt;/h2&gt;

&lt;p&gt;Log4Shell’s impact was unprecedented in scale. Apache Log4j is embedded so deeply across the software ecosystem that the vulnerability propagated almost everywhere at once and most organizations had no idea where they were exposed. The rush to assess risk revealed a fundamental problem: few teams maintained a reliable Software Bill of Materials (SBOM), and the question “are we affected?” had no quick answer.&lt;/p&gt;

&lt;p&gt;The scramble had at least one useful side effect: it pushed many teams to finally migrate from Log4j 1, already end-of-life since 2015, to Log4j 2.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons from Log4j perspective
&lt;/h2&gt;

&lt;p&gt;Since Log4j is mostly consumed as a dependency rather than built upon, the lessons the Apache Software Foundation’s Logging Services team drew from Log4Shell were different from those of the broader ecosystem. The problems were not about visibility into our own dependencies, but about the state of the project itself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Documentation was hard to navigate, with many features either undocumented or described only in terms a new contributor could not act on.&lt;/li&gt;
&lt;li&gt;The release process was antiquated, understood by only a handful of people, and run on personal hardware: a single point of failure that nobody had reason to address until a crisis made it unavoidable.&lt;/li&gt;
&lt;li&gt;Builds were slow and tests were flaky, meaning a failure late in a multi-hour process sent you back to the beginning.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these were unique to Log4j. Log4Shell made them impossible to ignore, and addressing them put us on a path that anticipates much of what the CRA now asks of software maintainers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Documentation: from maintainer knowledge to public record
&lt;/h2&gt;

&lt;p&gt;Logging is not always safe. There are real security concerns: CRLF injection from unstructured logging; sensitive information leaking into debug output; and injection of Log4j formatting patterns through user-supplied strings. Before Log4Shell, much of this knowledge lived in the heads of a few maintainers: not written down, not discoverable, and not actionable for the thousands of teams depending on the library.&lt;/p&gt;

&lt;p&gt;We rewrote the documentation website from scratch. The goal was to turn that private knowledge base into a public record by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Covering security best practices and an explicit &lt;a href="https://logging.apache.org/security.html" rel="noopener noreferrer"&gt;security model&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Providing reference documentation generated directly from code, so it stays in sync as the library evolves&lt;/li&gt;
&lt;li&gt;Making Log4j’s versioning policy and support status explicit and visible, both required for CRA attestationsMoving the issue tracker from JIRA closer to the code in GitHub Issues&lt;/li&gt;
&lt;li&gt;Mirroring some discussions on both GitHub &lt;/li&gt;
&lt;li&gt;Discussions and mailing lists &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The results were measurable: more documentation pull requests, more site visits, a useful proxy for coverage and clarity, and noticeably better answers from LLMs trained on our new content.&lt;/p&gt;

&lt;h2&gt;
  
  
  Release process: from manual to reproducible
&lt;/h2&gt;

&lt;p&gt;In December 2021, Log4j’s tests ran on a Jenkins instance, binaries were built on maintainer machines, signing was manual, and builds were not reproducible. A full binary and site build literally took hours. This was not unusual for open source projects, but it created real risks around build integrity, and it was clearly not sustainable.&lt;/p&gt;

&lt;p&gt;By September 2024 we had migrated to GitHub Actions, achieved reproducible builds signed by a CI GPG key only known to ASF admins, parallelized tests, and reduced the build-and-deploy cycle to around 30 minutes.&lt;/p&gt;

&lt;p&gt;Currently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The CI pipeline now automatically stages releases up to the voting phase: the first project in The ASF to do this.&lt;/li&gt;
&lt;li&gt;We are working on integration with Apache Trusted Releases, which will bring automation to the voting and publishing steps as well.&lt;/li&gt;
&lt;li&gt;We are working on full-SLSA build and source attestations, which will make us one of the first ASF projects to achieve this. This includes SLSA source level 4, requiring a non-author review for every commit: a critical guarantee for a project at the center of the most significant supply-chain incident in recent memory.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Machine-readable metadata: SBOMs, VEX, and beyond
&lt;/h2&gt;

&lt;p&gt;One of the most concrete CRA requirements is the expectation that software comes with machine-readable security information. We now publish CycloneDX SBOMs to Maven Central, which references a Vulnerability Disclosure Report, a machine-readable version of our CVE list, on our website. This gives downstream users a complete, well-curated source of vulnerability information, unaffected by the data loss that public vulnerability databases sometimes introduce when converting between formats. It is also open to improvements by contributors.&lt;/p&gt;

&lt;p&gt;The next step is Vulnerability Exploitability eXchange (VEX) statements, generated automatically through an open source toolset we are developing with OpenRefactory. The system combines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An AI-backed Root Cause Service that identifies vulnerable methods&lt;/li&gt;
&lt;li&gt;A Call Graph Service that maps per-component call graphs&lt;/li&gt;
&lt;li&gt;A VEX Generation Service that determines the maximum reachable path and generates enriched VEX statements, which we call VEXplanations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We are currently testing this within Apache Solr and plan to extend it to Log4j and Commons. The goal is to give downstream users a machine-readable guarantee of no known exploitable vulnerabilities, assessed automatically rather than by hand.&lt;/p&gt;

&lt;p&gt;We are also planning to support Common Lifecycle Enumeration (ECMA-428), the machine-readable equivalent of our supported versions list, generated ASF-wide through Apache Trusted Releases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vulnerability handling: from ad hoc to structured
&lt;/h2&gt;

&lt;p&gt;Since Log4Shell, the Logging Services team has put in place several structural improvements to vulnerability handling:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A dedicated reporting address (&lt;a href="mailto:security@logging.apache.org"&gt;security@logging.apache.org&lt;/a&gt;) separate from the general ASF security team&lt;/li&gt;
&lt;li&gt;An explicit threat model that clearly separates the security responsibilities of Log4j from those of its consumers&lt;/li&gt;
&lt;li&gt;A bug bounty program hosted on YesWeHack, funded by the Sovereign Tech Resilience program&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Since July 2024, the program has received 140 reports across Log4cxx, Log4j, and Log4net, yielding 10 CVEs.&lt;/p&gt;

&lt;p&gt;At the architectural level, Log4j 3 addresses the attack surface problem directly. Log4j 2 was built for a pre-Maven world and shipped as a monolithic core with many optional dependencies bundled in. Log4j 3 modularises most of that core, so each module only pulls in what it actually needs. Smaller surface area means fewer things that can go wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  Community sustainability: the hardest and unsolved problem
&lt;/h2&gt;

&lt;p&gt;All of the above is meaningless if the project burns out. And that risk is real. Log4j currently has two active maintainers doing most of the work. Log4cxx and Log4net are in a similar position. This is not a criticism of the community. It is the structural reality of most open source projects, and it is the problem that CRA compliance pressure will make worse if it is not addressed alongside the technical work.&lt;/p&gt;

&lt;p&gt;We are exploring two directions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;a href="https://www.open-source-economy.com/" rel="noopener noreferrer"&gt;Open Source Economy&lt;/a&gt; initiative: offering consulting and compliance attestations as a funded model, where fees support maintainers, infrastructure, and upstream dependencies.&lt;/li&gt;
&lt;li&gt;ECMA TC-54’s &lt;a href="https://tc54.org/contributing-yaml/" rel="noopener noreferrer"&gt;CONTRIBUTING.yaml &lt;/a&gt;specification: a machine-readable format for describing a project’s maintenance status, contribution needs, and support expectations, so that users and organisations can understand what they are depending on.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There is also a cultural dimension. Many open source communities grew up in an era when maintainers could commit directly to the repository and build software on their own machines. Shifting to a model where maintainers review all contributions and CI builds everything is the right move for security, but it is genuinely less fun, and communities resist it. Making that transition while keeping contributors engaged is one of the real challenges of building a CRA-ready project.&lt;/p&gt;

&lt;h2&gt;
  
  
  What CRA readiness actually looks like
&lt;/h2&gt;

&lt;p&gt;The CRA introduces a set of “due diligence” requirements for manufacturers and voluntary attestations for OSS projects. What I hope this post makes clear is that meeting them is not a compliance exercise you bolt on at the end. It is the output of years of work on documentation, build integrity, machine-readable metadata, vulnerability processes, and community health.&lt;/p&gt;

&lt;p&gt;The good news is that the direction was right before the regulation arrived. Log4Shell forced hard questions that led, eventually, to a project that is genuinely more secure, more transparent, and more sustainable than it was in 2021. The CRA gives that work a formal framework and, ideally, the organisational backing to fund it. For other open source projects facing the same pressures, that is perhaps the most useful takeaway: the path to CRA readiness and the path to a healthier, more sustainable project are the same path.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get involved
&lt;/h2&gt;

&lt;p&gt;The path to a CRA-ready ecosystem is not walked by one project. If any of this matters to you, here is where to start:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Contribute to Log4j directly. Code, documentation, and review all count. Start at &lt;a href="https://logging.apache.org/" rel="noopener noreferrer"&gt;logging.apache.org&lt;/a&gt;, or report security issues to &lt;a href="mailto:security@logging.apache.org"&gt;security@logging.apache.org&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Help across the ASF. The &lt;a href="https://security.apache.org/contributing/" rel="noopener noreferrer"&gt;Contributing to Apache Security&lt;/a&gt; guide is the way in.&lt;/li&gt;
&lt;li&gt;Shape OSS regulatory response. The &lt;a href="https://orcwg.org/" rel="noopener noreferrer"&gt;Open Regulatory Compliance Working Group&lt;/a&gt; is where CRA implementation for open source is being worked out in public.&lt;/li&gt;
&lt;li&gt;Define what sustainable maintenance means. &lt;a href="https://tc54.org/contributing-yaml/" rel="noopener noreferrer"&gt;ECMA TC54 TG4&lt;/a&gt; is building the machine-readable standards for project health and support.&lt;/li&gt;
&lt;li&gt;Secure the supply chain. The &lt;a href="https://slsa.dev/community" rel="noopener noreferrer"&gt;SLSA Community &lt;/a&gt;is advancing the build-integrity framework Log4j will soon use.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>cybersecurity</category>
      <category>java</category>
      <category>opensource</category>
      <category>security</category>
    </item>
    <item>
      <title>Apache Geode 2.0: Revival, Reinvention, and the Road Ahead</title>
      <dc:creator>Whitney </dc:creator>
      <pubDate>Tue, 03 Mar 2026 18:22:06 +0000</pubDate>
      <link>https://dev.to/theasf/apache-geode-20-revival-reinvention-and-the-road-ahead-48o5</link>
      <guid>https://dev.to/theasf/apache-geode-20-revival-reinvention-and-the-road-ahead-48o5</guid>
      <description>&lt;p&gt;Originally published at &lt;a href="https://news.apache.org/foundation/entry/apache-geode-2-0-revival-reinvention-and-the-road-ahead" rel="noopener noreferrer"&gt;https://news.apache.org/foundation/entry/apache-geode-2-0-revival-reinvention-and-the-road-ahead&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By: Jinwoo Hwang&lt;br&gt;
Lead Developer, Project Lead, and Release Manager, Apache Geode 2.0&lt;br&gt;
&lt;a href="https://JinwooHwang.com" rel="noopener noreferrer"&gt;https://JinwooHwang.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This post is divided into three parts. Part I explains why Apache Geode 2.0 matters. Part II walks through how it was modernized. Part III looks ahead—what we learned, what changed, and how you can help shape what comes next.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Apache Geode 2.0, Part I: The Revival of Apache Geode
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Legacy, purpose, and the moment a terminated project came back to life&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Apache Geode 2.0 is not just a new release—it is a statement of intent. Before diving into code, frameworks, or version numbers, it is worth understanding why this release exists at all. &lt;/p&gt;

&lt;p&gt;This story begins with a platform that once powered mission-critical systems, drifted toward obsolescence, and then found new life through conviction, persistence, and community. This first section sets the stage: the purpose behind the work, the legacy of Apache Geode, and the moment when a seemingly-finished project began its comeback.&lt;/p&gt;

&lt;p&gt;I have the privilege of serving as a Committer and Release Manager for Apache Geode 2.0. This release represents one of the most ambitious modernization efforts in the project’s history. For me, it has been more than engineering work—it has been a journey shaped by purpose, responsibility, and a deep belief in the value of our shared open source legacy.&lt;/p&gt;

&lt;p&gt;When I stepped into these roles, it became clear that Apache Geode could not survive on incremental change. The Java ecosystem had moved forward—Jakarta EE, Spring, Jetty, Tomcat, and security practices had all evolved—while Geode had effectively stood still. At the same time, unpatched vulnerabilities threatened user trust. To remain relevant, Geode needed a fundamental reset: technically, architecturally, and culturally.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I Took on This Project
&lt;/h2&gt;

&lt;p&gt;I do not earn additional compensation for the nights and weekends spent on Apache Geode. I am grateful to my employer for supporting my open source contributions, but this work did not replace my day job. I carried the same responsibilities while taking on this effort.&lt;/p&gt;

&lt;p&gt;The reason I stayed with it is simple: purpose. I believe in this project and the community behind it. Friedrich Nietzsche famously wrote, “He who has a why to live can bear almost any how,” an idea later echoed by Viktor Frankl in his work on meaning and resilience. That sense of why—of keeping something valuable alive—carried me through the hardest moments of this journey.&lt;/p&gt;

&lt;p&gt;With that context, it is worth stepping back and answering a foundational question.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Exactly Is Apache Geode?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4nisyyp5u62k8v5l9par.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4nisyyp5u62k8v5l9par.png" alt="What is Apache Geode" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Apache Geode is a distributed, in‑memory data management platform designed for low‑latency, scalable, and consistent data access. It is built for systems that must react in real time, handle massive data volumes, and remain operational under failure. Data is dynamically partitioned or replicated across a cluster, with built‑in fault tolerance and optional persistence to disk.&lt;/p&gt;

&lt;p&gt;As modern applications have shifted toward real-time analytics, event-driven architectures, and microservices, latency has become a central architectural constraint. Disk-backed storage systems, while durable and cost-efficient, often introduce millisecond-scale access times that are incompatible with sub-millisecond response requirements. In-memory data platforms address this gap by keeping active or frequently-accessed data in RAM, significantly reducing access latency and increasing throughput. This approach is particularly important in domains such as financial services, telecommunications, e-commerce, and IoT, where responsiveness, scale, and availability directly influence user experience and operational outcomes.&lt;/p&gt;

&lt;p&gt;At its core, Geode aggregates memory, CPU, and network resources across multiple nodes into a single, coherent data fabric. Applications continue running even when individual nodes fail—no blinking, no downtime. Geode supports multiple deployment models, including peer‑to‑peer, client/server, and multi‑site configurations, enabling it to scale from tightly coupled application clusters to geographically distributed systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  From GemFire to Geode—and Almost to Obsolescence
&lt;/h2&gt;

&lt;p&gt;Apache Geode’s lineage traces back to 2002, when GemStone Systems introduced GemFire, a commercial platform widely used in financial services for real‑time workloads. Through acquisitions—GemStone to SpringSource, then to VMware, and later to Pivotal—the technology evolved before being open sourced in 2015 and donated to The Apache Software Foundation as Apache Geode.&lt;/p&gt;

&lt;p&gt;For several years, the project thrived. But after 2019, corporate shifts and changing priorities reduced contributor engagement. By 2022, most committers were inactive. By mid‑2023, development had stopped entirely. In 2024, the PMC voted to terminate the project. Apache Geode appeared to be finished.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fauaxmrph2r7mxeva3cv3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fauaxmrph2r7mxeva3cv3.png" alt="Apache Geode Contributors Over Time" width="800" height="450"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Apache Geode Contributors Over Time&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Then came 2025. I began upstreaming my internal fork. The community delivered Apache Geode 1.15.2 in September, followed by Apache Geode 2.0 in December. What looked like an ending became a comeback—a transition from long winter to spring.&lt;/p&gt;

&lt;p&gt;By the time Apache Geode’s revival began, it was clear that survival alone was not enough. To remain viable, trusted, and relevant, the platform would need far more than incremental fixes—it would need a complete modernization from the ground up.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Learn what Apache Geode is &lt;a href="https://geode.apache.org/" rel="noopener noreferrer"&gt;https://geode.apache.org/&lt;/a&gt;&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>database</category>
      <category>distributedsystems</category>
      <category>news</category>
      <category>opensource</category>
    </item>
    <item>
      <title>How DiDi Scaled to Hundreds of Petabytes with Apache Ozone</title>
      <dc:creator>Whitney </dc:creator>
      <pubDate>Thu, 29 Jan 2026 23:42:28 +0000</pubDate>
      <link>https://dev.to/theasf/how-didi-scaled-to-hundreds-of-petabytes-with-apache-ozone-2bdk</link>
      <guid>https://dev.to/theasf/how-didi-scaled-to-hundreds-of-petabytes-with-apache-ozone-2bdk</guid>
      <description>&lt;p&gt;&lt;strong&gt;&lt;em&gt;Building a cost-effective, high-performance data foundation for global mobility&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you’re operating one of the world’s largest ride-hailing and mobility platforms, every millisecond and megabyte counts. For DiDi Global, which generates over one petabyte of new data every day, scaling storage isn’t just a technical challenge—it’s a business imperative.&lt;/p&gt;

&lt;p&gt;As the company’s data footprint grew to more than 500PB annually, DiDi’s engineers found themselves battling the limits of their legacy Apache HadoopⓇ Distributed File System (HDFS) storage layer. The infrastructure was struggling to keep pace with the company’s explosive data growth, slowing downstream analytics and machine learning (ML) workloads that power everything from route optimization to dynamic pricing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Challenge: Scaling Without Compromise
&lt;/h2&gt;

&lt;p&gt;DiDi’s HDFS-based infrastructure had served the company well, but it was beginning to show its age under the weight of petabyte-scale workloads. The team faced several interconnected problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Metadata bottlenecks:&lt;/strong&gt; File count limits in HDFS created stress on metadata services, driving up latency and throttling performance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Read-heavy workloads:&lt;/strong&gt; RPC congestion and HDD I/O bottlenecks introduced lag for analytics and AI pipelines.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Escalating costs:&lt;/strong&gt; Triple replication inflated storage use and operational expenses.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Operational risk:&lt;/strong&gt; Even routine maintenance, such as decommissioning, carried stability concerns.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These issues had tangible business impacts. Slow metadata operations increased latency for end users, inflated costs, and created risks during peak demand periods.&lt;/p&gt;

&lt;p&gt;“Metadata latency wasn’t just a technical problem—it slowed down business units that rely on real-time analytics and AI insights,” said JiangHua Zhu, Software Engineer, DiDi’s Storage Team.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: Apache Ozone
&lt;/h2&gt;

&lt;p&gt;After a rigorous evaluation, DiDi selected Apache Ozone™, a next-generation distributed storage system designed for scalability and performance in large, unstructured data environments.&lt;/p&gt;

&lt;p&gt;Ozone’s modern architecture—featuring RocksDB-based metadata management, separation of Object Manager (OM) and Storage Container Manager (SCM) services, and containerized data storage—provide the foundation DiDi needed to scale with confidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Benefits
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Massive scalability:&lt;/strong&gt; Ozone comfortably supports tens of billions of files, removing HDFS metadata constraints.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Performance optimizations:&lt;/strong&gt; Features like OM Follower Read, multi-cluster routing, and NVMe caching help minimize latency and balance system load.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cost efficiency through Erasure Coding:&lt;/strong&gt;&lt;br&gt;
Transitioning from 3x replication to EC 6-3 reduce storage overhead from 3.0x to roughly 1.5x—saving hundreds of petabytes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Enhanced resilience:&lt;/strong&gt; Container-based data granularity improves fault tolerance and streamlined operations.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;“Ozone gave us the flexibility to scale elastically across hundreds of petabytes without sacrificing performance,” said Wei Ming, DiDi engineer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Results: Faster, Leaner, and More Reliable
&lt;/h2&gt;

&lt;p&gt;The move to Apache Ozone delivered measurable, cross-functional benefits across DiDi’s data ecosystem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Latency:&lt;/strong&gt; P90 GetMetaLatency improved from 90ms to 17ms.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Throughput:&lt;/strong&gt; Production read throughput increased by more than 20% with OM follower reads.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cost savings:&lt;/strong&gt; Erasure Coding cut the storage footprint nearly in half, saving both capital and operational expenses.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Stability under load:&lt;/strong&gt; The platform now operates smoothly even during cluster maintenance and peak traffic periods.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Developer productivity:&lt;/strong&gt; Application teams no longer need to manage small-file compaction, reducing complexity and accelerating data delivery.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Smooth Adoption Through Planning and Community Collaboration
&lt;/h2&gt;

&lt;p&gt;DiDi’s migration to Ozone was meticulous and deliberate. Engineers ensured data consistency with DistCp COMPOSITE_CRC checksums, implemented dual-write for rollback safety, and validated end-to-end compatibility with Hadoop, Apache Spark™, and S3 APIs.&lt;/p&gt;

&lt;p&gt;The company also leaned heavily on the Apache Ozone open source community—which contribute bug fixes, performance enhancements, and feedback that benefit all users.&lt;/p&gt;

&lt;p&gt;“The open source community was instrumental in our success—we gained support, shared knowledge, and received bug fixes that help everyone,” said Shilun Fan, DiDi’s storage leadership.&lt;/p&gt;

&lt;p&gt;DiDi engineers even became active contributors, helping resolve issues such as metadata inconsistencies and Erasure Coding container handling. The collaboration ultimately strengthened both DiDi’s deployment and the broader Ozone ecosystem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical Highlights
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Storage savings:&lt;/strong&gt; Hundreds of petabytes saved through Erasure Coding (6-3).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Read efficiency:&lt;/strong&gt; 20%+ improvement from OM follower reads and NVMe caching.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Unified access:&lt;/strong&gt; Hadoop API and S3 compatibility for batch, interactive, and ML workloads.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scalability:&lt;/strong&gt; A single Ozone cluster can handle ~5 billion files, with the potential to scale to tens of billions.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Looking Ahead
&lt;/h2&gt;

&lt;p&gt;DiDi’s storage team continues to push the boundaries of performance and efficiency. Upcoming initiatives include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Integrating IO_URING and SPDK to enhance I/O performance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Developing AI-driven operational insights for anomaly detection and auto-remediation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Piloting tiered storage strategies for hot, warm, and cold data layers to optimize cost and performance.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;“Ozone is more than a storage layer—it’s the backbone of DiDi’s data ecosystem and future AI innovation,” said Hongbing Wang, DiDi technical lead.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Takeaway
&lt;/h2&gt;

&lt;p&gt;By embracing Apache Ozone, DiDi transformed its data storage infrastructure from a limitation into a competitive advantage. The move delivered lower costs, higher reliability, and faster access to the insights that power intelligent mobility.&lt;/p&gt;

&lt;p&gt;At petabyte scale, even incremental improvements deliver outsized impact—and with Apache Ozone, DiDi has built a storage foundation ready for the next decade of data-driven innovation.&lt;/p&gt;

&lt;p&gt;To learn more about Apache Ozone:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Apache Ozone GitHub: &lt;a href="https://github.com/apache/ozone" rel="noopener noreferrer"&gt;https://github.com/apache/ozone&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Apache Ozone Getting Started: &lt;a href="https://ozone.apache.org/docs/edge/start/startfromdockerhub.html" rel="noopener noreferrer"&gt;https://ozone.apache.org/docs/edge/start/startfromdockerhub.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Apache Ozone LinkedIn page: &lt;a href="https://www.linkedin.com/company/apache-ozone/" rel="noopener noreferrer"&gt;https://www.linkedin.com/company/apache-ozone/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Apache Ozone X.com handle: &lt;a href="https://x.com/ApacheOzone" rel="noopener noreferrer"&gt;https://x.com/ApacheOzone&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Apache Ozone Best Practices at Didi: &lt;a href="https://ozone.apache.org/assets/ApacheOzoneBestPracticesAtDidi.pdf" rel="noopener noreferrer"&gt;https://ozone.apache.org/assets/ApacheOzoneBestPracticesAtDidi.pdf&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>architecture</category>
      <category>dataengineering</category>
      <category>opensource</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
