Sudip Sengupta

Posted on Oct 2, 2021

Why OpenEBS 3.0 for Kubernetes and Storage?

Back before Kubernetes supported stateful sets, a number of engineers including the founders of the OpenEBS project anticipated the potential of Kubernetes as a means to build and operate scalable, easy to use storage and data orchestration software. Fast forward a few years and according to recent surveys OpenEBS has become the most popular open source Container Attached Storage; we estimate that nearly 1 million nodes of OpenEBS are deployed per week. OpenEBS has become a particular favorite when deploying and operating resilient workloads on Kubernetes such as Cassandra and Elastic, with reference users including ByteDance / TikTok, Flipkart and Bloomberg.

In this blog I am pleased to announce the release of OpenEBS 3.0 - a major release that includes several enhancements and new features both at the storage layer and at the data orchestration layer.

Before I explain more about OpenEBS 3.0, let me share some background on the use of Kubernetes for data.

First, it is worth emphasizing that Kubernetes is popular in part because it enables small teams to be largely autonomous and to remain loosely coupled from the rest of the organization much as the cloud native workloads they built also remain loosely coupled from other workloads.

Secondly, the desire for small team and workload autonomy has led to the widespread use of Directed Attached Storage and similar patterns for cloud native stateful workloads. Workloads such as Cassandra and Elastic and dozens of others and the teams that run them would rather have their own systems dedicated to these workloads than rely on the vagaries and shared blast radius of traditional shared storage. Shared storage in many ways is redundant or moot when the workloads themselves are highly resilient and loosely coupled - especially when these workloads run on Kubernetes which provides a common approach to scalable deployment and operations.

Container Attached Storage, of which OpenEBS is the leading open source example, emerged in response to the needs for autonomy and control by these small teams and workloads. You can think of Container Attached Storage as providing “just enough” per workload capabilities without introducing the sort of shared dependencies that today’s organizations, and API-first cloud native architectures, are built to avoid.

And yet, the expertise of the engineers that have built shared storage systems and the operators that have kept them running remain invaluable. One common pattern that we see at the forefront of the use of Kubernetes for data is the need to address use cases that were common when discrete shared storage systems were the foundation of architectures. Many of these use cases, such as capacity management and building various policies around the requirements of workloads, are at least as important as before. One theme of OpenEBS 3.0 is that these and related use cases are being reimagined and implemented in a way that I see as extremely promising, delivering both the autonomy needed by users of Kubernetes for data and the control and visibility required by enterprises increasingly relying upon Kubernetes to build and operate the software that runs their business.

In the sections to follow, I highlight the various enhancements in OpenEBS 3.0 and the benefits of using OpenEBS for Kubernetes Storage.

OpenEBS 3.0 for Kubernetes Storage

We are grateful for the support and contributions of the vibrant open-source community that OpenEBS has received. We are also thankful to the Cloud Native Computing Foundation (CNCF) for including OpenEBS as one of its storage projects. And a special thanks to the CNCF for being a reference user of OpenEBS as well - you can read about their experience and that of others including TikTok / ByteDance and Verizon / Yahoo on Adopters.md. Collectively, these aspects have helped my team to notice challenges and opportunities and of course to resolve bugs and improve the polish of OpenEBS with each release.

One can visualize the direction of the additions to OpenEBS as going in two directions or dimensions:

Horizontally - enabling the use of resilient workloads at significant scale - often using LVM or a variety of similar alternatives on local nodes - here we see users like Flipkart and Bloomberg helping us understand what it is like to run dozens of workloads on thousands of nodes with the help of Kubernetes.

Vertically - pushing down to the layers of NVMe, IOring, SPDK and so on to provide a software defined data storage layer for replicated storage that is refactored, written in Rust, and performant. This is of course a multi-year project - which we call OpenEBS Mayastor. We are a couple of years into working on this project and so far, so good. Use cases include the Kubernetes edge, including running on ARM form factors, and on clouds and in the DC where workloads need additional resilience and data services.

Advances in OpenEBS 3.0 across the horizontal dimension, including local node management, capacity based scheduling, and other operational improvements include:

LocalPV - New Features and Enhancements

OpenEBS uses LocalPV provisioners to connect applications directly with storage from a single node. This storage object, known as LocalPV, is subject to the availability of the node on which it is mounted, making it a handy feature for fault-tolerant applications who prefer local storage over traditional shared storage. The OpenEBS LocalPV provisioner enables Kubernetes-based stateful applications to leverage several types of local storage features ranging from raw block devices to using capabilities of filesystems on top of those devices like LVM and ZFS.

OpenEBS 3.0 includes the following enhancement to the LocalPV provisioner:

Support for 3 new types of LocalPVs namely LVM LocalPV, Rawfile LocalPV, Partition Local PV in addition to the previously supported Hostpath LocalPV, Device LocalPV and ZFS LocalPV.

OpenEBS Hostpath LocalPV (declared stable), the first and the most widely used LocalPV now supports enforcing XFS quotas and the ability to use a custom node label for node affinity (instead of the default 'kubernetes.io/hostname').

OpenEBS ZFS LocalPV (declared stable), used widely for production workloads that need direct and resilient storage has added new capabilities like:

Velero plugin to perform incremental backups that make use of the copy-on-write ZFS snapshots.
CSI Capacity based scheduling used with 'waitForFirstConsumer' bound Persistent Volumes.
Improvements to inbuilt volume scheduler (used with 'immediate' bound Persistent Volumes) that can now take into account the capacity and the count of volumes provisioned per node.

OpenEBS LVM LocalPV (declared stable), can be used to provision volume on top of LVM Volume Groups and supports the following features:

Thick (Default) or Thin Provisioned Volumes.
CSI Capacity based scheduling used with 'waitForFirstConsumer' bound Persistent Volumes.
Snapshots that translate into LVM Snapshots.
Ability to set QoS on the containers using LVM Volumes.
Also supports other CSI capabilities like volume expansion, raw or filesystem mode, and metrics.

OpenEBS Rawfile LocalPV (declared beta), is a preferred choice for creating local volumes using a sparse file within a sub-directory that supports capacity enforcement, filesystem or block volumes.

OpenEBS Device LocalPV (declared beta), is a preferred choice for running workloads that have typically worked well with consuming the full disks in block mode. This provisioner uses the OpenEBS component NDM or node device manager to select the block device.

OpenEBS Partition LocalPV (an alpha engine), is under active development and is being deployed in select users for creating volumes by dynamically partitioning a disk with the requested capacity from the PVC.

ReplicatedPV - New Features and Enhancements

OpenEBS can also use ReplicatedPV provisioners to connect applications to volumes - whose data is synchronously replicated to multiple storage nodes. This storage object, known as ReplicatedPV, is highly available and can be mounted from multiple nodes in the clusters. OpenEBS supports three types of ReplicatedPVs Jiva (based on Longhorn and iSCSI), CStor (based on ZFS and iSCSI) and Mayastor (based on SPDK and NVMe).

Some enhancements to replicated storage engines in OpenEBS 3.0 include:

OpenEBS Jiva (declared stable), has added support for a CSI Driver and Jiva operator that include features like:

Enhanced management of the replicas.
Ability to auto-remount the volumes marked as read-only due to iSCSI time to read-write.
Faster detection of the node failure and helping Kubernetes to move the application out of the failed node to a new node.
3.0 also deprecates the older Jiva volume provisioners - that was based on the kubernetes external storage provisioner. There will be no more features added to the older provisioners and users are requested to migrate their Volumes to CSI Drivers as soon as possible.

OpenEBS CStor (declared stable), has added support for a CSI Driver and also improved customer resources and operators for managing the lifecycle of CStor Pools. This 3.0 version of the CStor includes:

The improved schema allows users to declaratively run operations like replacing the disks in mirrored CStor pools, add new disks, scale-up replicas, or move the CStor Pools to a new node. The new custom resource for configuring CStor is called CStorPoolCluster (CSPC) compared to older StoragePoolCluster(SPC).
Ability to auto-remount the volumes marked as read-only due to iSCSI time to read-write.
Faster detection of the node failure and helping Kubernetes to move the application out of the failed node to a new node.
3.0 also deprecates the older CStor volume provisioners and pool operators based on SPC - that was based on the kubernetes external storage provisioner. There will be no more features added to the older provisioners and users are requested to migrate their Pools to CSPC and Volumes to CSI Drivers as soon as possible.

Advances in OpenEBS 3.0 in the vertical dimension, including addition resilience with performance via Mayastor, (beta) include:

A new and enhanced control plane to manage the mayastor pools and volumes.
Support for deprecating the MOAC based control plane in favor of the new control plane.
Enhanced control plane to handle node failure scenarios and move the volumes to new nodes.
Stabilizing the Mayastor data engine for durability and performance.
Enhanced E2e tests.

Other Notable Features and Enhancements

Beyond the improvements to the data engines and their corresponding control plane, there are several new enhancements that will help with ease of use of OpenEBS engines:

OpenEBS CLI (a kubectl plugin) for easily checking the status of the block devices, pools (storage) and volumes (PVs).
OpenEBS Dashboard (a prometheus and grafana mixin) that can be installed via jsonnet or helm chart with a set of default Grafana Dashboards and AlertManager rules for OpenEBS storage engines.
Dynamic NFS Provisioner that allows users to launch a new NFS server on any RWO volume (called backend volume) and expose an RWX volume that saves the data to the backend volume.
Several fixes and enhancements to the Node Disk Manager such as automatically adding a reservation tag to devices, detecting filesystem changes, and updating the block device CR (without the need for a reboot), as well as an improved metrics exporter and an API service that can be extended in the future to implement storage pooling or cleanup hooks.
Kubernetes Operator for automatically upgrading Jiva and CStor volumes via a Kubernetes Job.
Kubernetes Operator for automatically migrating CStor Pools and Volumes from older pool schema and legacy (external storage based) provisioners to the new Pool Schema and CSI volumes respectively.
Enhanced OpenEBS helm chart that can easily enable or disable a data engine of choice. The 3.0 helm chart stops installing the legacy CStor and Jiva provisioners. If you would like to continue to use them, you have to set the flag “legacy.enabled=true”.
OpenEBS helm chart includes sample kyverno policies that can be used as an option for PodSecurityPolicies(PSP) replacement.
OpenEBS images are delivered as multi-arch images with support for AMD64 and ARM64 and hosted on DockerHub, Quay and GHCR.
Improved support for installation in air gapped environments.
Enhanced Documentation and Troubleshooting guides for each of the engines located in the respective engine repositories.
Last but not least -> a new and improved design for the OpenEBS website.

Benefits of OpenEBS for Stateful Kubernetes Workloads

OpenEBS is used as a persistent storage solution for many stateful Kubernetes applications as it offers benefits such as:

Open-source cloud-native storage

Built fully in Kubernetes, OpenEBS follows a loosely-coupled architecture that brings the benefits of cloud-native computing to storage. The solution runs in the Kubernetes userspace, which makes it portable enough to run on any platform/operating system.

Eliminate vendor lock-in

When an Open EBS Storage Engine (such as cStor, Mayastor or Jiva) is used, it acts as a data abstraction layer. The data can easily be moved between different Kubernetes environments, whether it's on-premise, traditional storage or in the cloud.

Granular policies for stateful workloads

Since OpenEBS volumes are managed independently, organizations can enable collaboration between small, loosely coupled teams. Storage parameters for each volume and workload can be monitored independently, allowing for the granular management of storage policies.

Reduced Storage Costs

Microservices-based storage orchestration allows for thin provisioning of pooled storage, and data volumes can be grown as needed. This also means that storage volumes can be added instantaneously without disrupting applications or volumes exposed to workloads.

High Availability

With Container Attached Storage (CAS), storage controllers are rescheduled in case of node failures. This allows OpenEBS to survive pod restarts while the data stored is protected through synchronous replication of data engines. When a node fails, only the volume replicas in that node are lost.

Disks Managed Natively on Kubernetes

OpenEBS includes the Node Disk Manager (NDM) that enables administrators to manage disks using inherent Kubernetes constructs. NDM also allows administrators to automate storage needs such as performance planning, volume management, and capacity planning using efficient pool and volume policies.

With an upgrade to OpenEBS 3.0, you also get:

Enhanced Quality of Service (QoS) for efficient pod scheduling and eviction.
Mayastor that is built on Intel SPDK to serve IO for scalable, high-performance storage applications.
Rapid disaster recovery for cluster/ETCD failures using Playbooks.
Capacity-based scheduling to optimize storage costs.

Summary

Storage orchestration in Kubernetes requires a novel approach since the platform was initially built to manage stateless, ephemeral containers. This was one of the most critical problems that we had on our minds while we started developing OpenEBS. We helped to build various projects, such as NDM, LocalPV and Read-Write-Many (RWX) PVCs to ensure the platform is helpful for Kubernetes administrators to handle common storage challenges - including the use of Kubernetes for cloud native, resilient applications. Besides this, our idea of leveraging the CAS architecture was to enrich Kubernetes storage with additional benefits such as a lower blast radius, vendor and cloud provider agnostic storage, small team agility and control, and granularity of storage policies to embrace the specific and highly varied needs of workloads.

With its 3.0 release, OpenEBS is further enhanced and mature to support stateful applications. But the journey doesn’t stop here and there is more to come!

This article has already been published on https://blog.mayadata.io/why-openebs-3.0-for-kubernetes-and-storage and has been authorized by MayaData for a republish.