DEV Community: Patrick McFadin

The Serverless Database You Really Want

Patrick McFadin — Tue, 24 May 2022 15:15:17 +0000

The dreaded part of every site reliability engineer’s (SRE) job eventually: capacity planning. You know, the dance between all the stakeholders when deploying your applications. Did engineering really simulate the right load and do we understand how the application scales? Did product managers accurately estimate the amount of usage? Did we make architectural decisions that will keep us from meeting our SLA goals? And then the question that everyone will have to answer eventually: how much is this going to cost? This forces SREs to assume the roles of engineer, accountant and fortune teller.

The large cloud providers understood this a long time ago and so the term “cloud economics” was coined. Essentially this means: rent everything and only pay for what you need. I would say this message worked because we all love some cloud. It’s not a fad either. SREs can eliminate a lot of the downside when the initial infrastructure capacity discussion was maybe a little off. Being wrong is no longer devastating. Just add more of what you need and in the best cases, the services scale themselves — giving everyone a nice night’s sleep. All this without provisioning a server, which gave rise to the term “serverless.”

As serverless methodologies have burned through the application tiers, databases have turned out to be the last big thing to feel the heat of progress. No surprise though. Stateful workloads — as in information I really want to keep — is a much harder problem to solve than stateless workloads. The cloud providers have all released their own version of a serverless database, provided you agree to be locked into their walled garden. Open source has always served as the antidote for the dreaded lock-in, and there are really exciting things happening in the Apache Cassandra community in that regard.

The Oracle That Foretold the Future

In the early days of distributed databases, a groundbreaking paper changed everything: the Dynamo paper from Amazon, published in 2007. In it, a team of researchers and engineers described how an ideal system would be built to maximize performance and data consistency while balancing scale and operations. To quote the paper: “A highly available key-value storage system that some of Amazon’s core services use to provide an ‘always-on’ experience.” It served as the basis for several database implementations, including what would become Apache Cassandra.

Dynamo assumed the availability of cheap, commodity hardware in the coming cloud era. As our industries have slowly morphed into building cloud native applications, the definition of commodity hardware has changed. Instead of units being bare-metal or virtual machines, we consume individual scale components of network, compute and storage. Building a serverless Cassandra database continues the work of the Dynamo paper inside this new paradigm; and with it, new scaling and deployment options that fit our cloud native world.

Defining Commodity

In 2007 when the paper was first published, the definition of a commodity was much different than today. Most server-class systems were bulky and incredibly complex to provide the compute power needed and uptime required. ”Commodity” entailed very inexpensive, small servers with the most basic CPU, disk and memory. The first time I deployed Cassandra in my infrastructure, I was able to use the commodity servers to scale out and in the process save a lot of money to achieve better results.

Then along came the cloud and even more changes in definitions. Commodity was now an instance type we could provision and pay for by the hour. This fueled a massive expansion of scale applications and the rise of cloud native but CPU, disk and memory all still had to be considered, especially in stateful workloads like a database. So, the dreaded capacity planning discussion was still happening in deployment meetings. Thankfully, the impact of making a wrong decision was much less when using cloud infrastructure, especially with Cassandra. Need more? Just add more instances to your cluster. Goodbye capacity wall, hello scale.

Now we are at a time when Kubernetes is advancing the pointer of what we can do with cloud native applications. With it, we’ve seen yet another shift in commodity definitions. The classic deployable server or instance type has been decomposed into compute, network and storage. Kubernetes has created a way for us to define and deploy an entire virtual data center, with the parts we need to support the applications we are deploying. Containers allow for precise control over the compute needed (and when).

Software-defined networks do all the complicated virtual wiring in our data centers dynamically. All of which creates an environment that is elastic, scalable, and self-healing. We also get the added benefit of fine-grained cost controls. Goodbye over-provisioning, hello cloud economics.

Open Source: Now More Important Than Ever

Just like the majority of data infrastructure innovations in the past 10 years, the breadth and depth of the needed changes can only be addressed by an engaged community of users. The revolution in serverless databases will happen in open source. Clouds moved fast on early serverless implementations, but as we in open source know: to go far, we go together. The cloud economics of using a vendor-specific serverless database works great, right up until it doesn’t. Free as in freedom means you should be able to use it anywhere. In a cloud, in your own datacenter, or even on your laptop.

One aspect that has driven the popularity of Kubernetes is the undeniable benefit of cloud portability and freedom. Overlay your deployment of a virtual datacenter against any provider of commodity compute, network and storage. Don’t like where you are? Take your data center somewhere else. Don’t like renting the services in a cloud? Run them yourself in Kubernetes. The near future will be about creating new cloud data services in Kubernetes and the communities we form around this exciting part of modern data applications.

The Dynamo pedigree of Apache Cassandra and years of proven reliability in the biggest workloads put it in a strong position for the next revolution of serverless databases. At DataStax, we are the company that just loves open source Cassandra; we have seen this future direction of databases unfolding and we’re excited to participate. We have also been building our own deep experience of running large-scale database cloud deployments in Kubernetes, via DataStax Astra. As a result, our engineering teams have created some of the beginning work for a serverless Cassandra. We will be refining and building knowledge about how to take advantage of the new cloud native commodity definitions and passing on the lower costs of cloud economics.

Expect to see our ideas and code in a GitHub repository soon and discussions opening about what we have learned. Already the Cassandra community is talking about what will happen after 4.0 and it’s safe to say that a serverless Cassandra is top of the list. Inclusion in the open source project K8ssandra, combined with the Stargate project, will further expand the freedom of deployment options and usage.

Data on Kubernetes depends on true cloud economics and scale, which takes us back to our SREs. In the near future when they are thinking about capacity planning, I would love to give them the option of having one less stressful meeting.

Learn more about how DataStax Astra is enabling faster application development and streamlined operations with serverless databases.

The End of the Beginning for Apache Cassandra

Patrick McFadin — Tue, 17 May 2022 20:37:57 +0000

Editor’s note: This story originally ran on July 27, 2021, the day that Apache Cassandra 4.0 was released.

Today is a big day for those of us in the Apache Cassandra community. After a long uphill climb, Apache Cassandra 4.0 has finally shipped. I say finally, because it has at times seemed like an elusive goal. I’ve been involved in the Cassandra project for almost 10 years now and I have seen a lot of ups and downs. So I feel this day marks an important milestone that isn’t just a version number. This is an important milestone in the lifecycle of a database project that has come into its own as an important database used around the world. The 4.0 release is not only incredibly stable in the history of Cassandra, but it’s also quite possibly the most stable release of any database. Now it’s ready to launch into the next 10 years of cloud native data; it has the computer science and hard-won history to make a huge impact. Today’s milestone is the end of the beginning.

Not the hero you wanted, but the hero you need

In early 2011, I was having lunch with the person whom I would call the first evangelist for Cassandra: Adrian Cockcroft. At the time, he was helping transform Netflix from a mail-based DVD company to a streaming company that required a lot of cutting-edge technology, some of which hadn’t even been invented yet. We talked about many things, but the only thing I remember was when he told me I should try out this distributed database they were using called Cassandra. It was one of the dozens of NoSQL databases exploding on the scene, as those of us trying to scale infrastructure based on increasing demand were finding the limits of Oracle and MySQL. That night I had a running cluster of 0.7 running in [Amazon Web Services ](https://aws.amazon.com/

), and I haven’t stopped since.

These were the early years of Cassandra, and as is typical in the 30- to 40-year typical lifespan of any database, it was a time of amazing growth in features and innovation. Cassandra was being adopted in organizations that needed scale and were ready to devote the engineering time to keep the pace of innovation fast. The computer science was really clear: To meet the type of scale requirements modern applications need, you have to use a coordination-free database that is built for availability and partition tolerance. There were teams using other technologies, but not always successfully. Because of this, Cassandra earned the reputation as the database that wouldn’t let you down, though it was really hard to learn. If all other databases failed to deliver the needed uptime or scale, Cassandra could do the job.

Coming of age isn’t easy

In 2016, Cassandra 3.0 was released and one of the big changes was a completely new storage engine. Anyone who has worked in operations knows that major alterations to core components need their time in service before reaching a stability point that’s generally trusted. Cassandra wasn’t immune to this. With a lot of initial issues in the 3.0 storage engine, most users opted to stay with 2.1 and wait to upgrade. At about that time, DataStax was pulling away from the project, which led to a lot of internal project conflict. Apache Cassandra had arrived at the awkward adolescent years.

Just like human teenagers, the Apache Cassandra project was having a moment of asking itself, “What do I want to be when I grow up?” That conversation was happening between the contributors, committers and the project management community. Stability and correctness are the only things that count for a database that a large part of the world depends on as a source of truth. At ApacheCon 2019, I attended a large, ad-hoc gathering of people to discuss what standards the Cassandra project wanted to hold for a version release. We didn’t have a conference room, so we all sat on the floor debating in a side hallway of the Flamingo Las Vegas. In the end, we agreed that a single statement embodies how we have to move forward, and it was adopted by the project: “The overarching goal of the 4.0 release is that Cassandra 4.0 should be at a state where major users would run it in production when it is cut.”

DataStax is the open, multicloud stack for modern data apps. DataStax gives enterprises the freedom of choice, simplicity, and true cloud economics to deploy massive data, delivered via APIs, powering rich interactions on multicloud, open source, and Kubernetes.

The idea that a dot-zero release could be considered production stable doesn’t fit in many operators’ world views. Of course, you have to wait at least a few bug releases before trying your hand at an upgrade, right? The members of the Apache Cassandra community decided to challenge that idea. What is the point of a beta release or a release candidate? Since this is not being built in a cathedral and instead in the open bazaar, a real contribution to the project will be running a beta with production workloads. And before getting to the beta, we need to be able to test correctness in a variety of ways that failure can happen consistently and continuously. Incredible tools have been built in the project in the past few years that are unmerciful in the failure modes they present. The payoff has been real. Apache Cassandra 4.0 is green on all tests and, as promised, being run in production by the organizations sponsoring engineering time. It’s being released because the members of the project believe in the promise that this will be the most stable database you can use.

On the shoulders of giants

This is how we got to today. A solid pedigree from the beginning, years of innovation and a commitment to quality. The database is trusted by companies like Netflix, Uber, Flipkart, ING Bank and hundreds of others. And now we are on the cusp of a new era of Cassandra. Truly the end of the beginning. So what is next for Cassandra?

Quite a lot actually.

To get quality management to a place the project needed, there had to be an early code freeze to stop continuous changes and the endless tail-chasing that can cause. This has meant that a relatively small number of big features have been released, yet innovation has been on the sidelines with an eye toward the days after 4.0 and after the code freeze is lifted. Beyond the inner circle of the project, we already have a great ecosystem of projects and companies around Cassandra that increase access and make it easier to use. Expect to see this grow even more as the need for a stable foundation in distributed applications increases. Projects like K8ssandra and Stargate can rely on Cassandra and focus on their own project goals. If you have a project that needs a reliable and trusted data store like Cassandra, many people in our community are ready to help. You just need to ask.

The Cassandra Enhancement Process (CEP) was put in place to bring major new features into Cassandra. Several have already been started, including Storage Attached Indexing, which is designed to be a replacement to the original secondary indexing and a part of DataStax’s re-commitment to support Apache Cassandra. Others that have been proposed include adding joins to Cassandra (yes I said that) and important upgrades to transactions. The change proposed with potentially the widest impact is the implementation of pluggable storage. This grants the ability to use a variety of storage engines with Cassandra, optimized for certain workloads. Instagram had already shown early promise with this idea by adding RocksDB as a storage engine so the possibilities are really exciting. Happening in parallel, Cassandra will be taking advantage of the rapidly evolving innovations in Java garbage collection. Zero Garbage Collection (ZGC) is now delivering sub-millisecond pause times in JDK16 and even bigger gains in the near future. The impact on Cassandra and other JVM-based systems will be profound. Stay tuned for some mind-blowing benchmarks.

If you are a current user of Apache Cassandra, you should consider an upgrade soon. The stability and performance improvements will make it worthwhile. If you’re new to Cassandra, now is a great time to get in and give it a try. You won’t be disappointed and the rocket ship is getting ready to take off again — make sure to not miss this ride. We would love for you to be in our community. The next era of Cassandra is going to be exciting and full of its own challenges. One thing’s for sure, it won’t be like the last era.

Learn more about DataStax Astra DB, the DBaaS built on Cassandra.

This story was originally published in The New Stack.

Kubernetes and Apache Cassandra: What Works (and What Doesn’t)

Patrick McFadin — Thu, 17 Feb 2022 18:19:36 +0000

“I need it now and I need it reliable”

– ANYONE WHO HASN’T DEPLOYED APPLICATION INFRASTRUCTURE

If you’re on the receiving end of this statement, we understand you here in the K8ssandra community. Although we do have reason for hope. Recent surveys have shown that Kubernetes (K8s) is growing in popularity, not only because it’s powerful technology, but because it actually delivers on reducing the toil of deployment.

Deploy an application on Kubernetes and it’s easy to miss the almost magical, complex orchestration that happens as compute, network, and storage are all aligned into what you declared in a YAML file. With the size and scale of applications we need for modern cloud applications, we could certainly use a little magic.

While Kubernetes is the go-to orchestration platform to run distributed applications in the cloud, Cassandra provides a dependable distributed database environment.

If you go back to when we were using shell scripts on bare-metal, you’ll find Apache Cassandra® proudly growing with the tech stacks of companies like Uber, Spotify, and Netflix. Known for its robust, distributed, and scalable infrastructure — with no single point of failure and high availability — it’s the top choice for any business operating large-scale cloud applications that need to reliably maintain their “always-on” services.

It’s common to Cassandra and K8s described as the “most logical pairing” since they allow you to keep your data and operations close for better performance at scale.So let’s take a closer look at what makes Cassandra and K8s a dream team — and why this isn’t always true.

Why Cassandra and K8s work so well together

Like any great story of modern infrastructure, it all started with the need to scale. Before the term “cloud native” became mainstream, Cassandra was inspired by the distributed storage and replication techniques from Amazon’s DynamoDB, as well as the data storage engine model from Google’s Bigtable. The idea was to build the best of both into one package to help guarantee availability and resiliency for large-scale, business-critical applications.

Guided by a similar philosophy, Kubernetes boasts a leaderless architecture that also makes it reliable, easy to scale, and highly available. The main difference is that Cassandra is a shared-nothing architecture (i.e. nodes don’t share memory or storage), whereas Kubernetes has a primary node — the control plane — which runs across multiple machines to provide fault-tolerance.

Essentially, both Cassandra and Kubernetes are distributed systems designed to meet the snowballing requirements for data and storage in global-scale apps. It’s unlikely we’ll have less data and infrastructure in the future, so we need strategies that embody the following ideas:

Scale: Cassandra and K8s both allow for horizontal or vertical scaling, and are both based on nodes, which lets developers expand or shrink their infrastructure with no downtime or third-party software. You can simply tell K8s by how much you want to resize your Cassandra cluster and let it deal with the logistics.
Elasticity: With the ability to dynamically add or remove nodes, developers can build and run distributed applications that automatically scale, based on demand, to free resources outside of peak load periods. One of the more critical elements of controlling costs in scale infrastructure is finding ways to prevent paying for idle resources.
Self-healing: K8s will instantly redeploy failed containerized apps, and Cassandra makes it easy to recover failed nodes without losing any data thanks to its built-in replication. For example, Spotify uses Cassandra to easily replicate the data between their EU and US data centers, allowing Spotify’s music personalization system to reach their users if any single center should experience a failure.

The reason both K8s and Cassandra can accomplish similar things is rooted in the architecture of distributed systems. Namely, building systems that act as a team where no one piece is critical. When comparing Figure 1 to Figure 2, you can see some basic similarities; nodes act independently and communicate via a network to coordinate and exchange data.

The concept of a node in distributed computing is a basic unit of scale and resilience. Cassandra overlays nicely on a K8s cluster with concerns for compute, network, and storage being managed independently. Again, the main difference is the control plane, which leads us to our next point.

How sometimes they don’t work well together

It would appear Kubernetes and Cassandra are highly compatible and can coexist peacefully in your tech stack. After all, they’re both distributed, scalable, and resilient — except they’re actually more like two pieces from different puzzles that don’t quite fit together without some elbow grease.

Take Cassandra operators, for example. In Kubernetes, the compute and storage are separate rather than managed as a group. So in a failure scenario, K8s could replace a node without attaching the precious storage data. The challenge is keeping the storage with the Cassandra node that owns the data, which is simple to do using operators like cass-operator instead of hours of manual work.

In recent years we’ve seen a flurry of open-source technologies designed to solve some of the challenges around K8s and Cassandra. K8ssandra, for example, is a production-ready, open-source project that abstracts away the transactional and operational aspects of Cassandra deployments, making it easy to deploy and manage Cassandra on any K8s engine (and they mean any).

As compatible as K8s and Cassandra may be, sometimes you don’t want to have to piece things together, you just want the full picture pre-assembled so you can move on to building better things. The bottom line is: there’s a growing need for a cloud-native database that’s inherently designed for Kubernetes. One that’s built to leverage its capabilities, not dance around them.

In the next post, we’ll take a closer look at how Cassandra pushes the boundaries of K8s, and explore how the pipe dream of a cloud-native database designed specifically for K8s might not be just a dream after all. Curious to learn more about (or play with) Cassandra itself? We recommend trying it on the Astra DB free plan for the fastest setup.

Resources

The future of cloud-native databases begins with Apache Cassandra 4.0

Patrick McFadin — Tue, 25 Jan 2022 17:12:14 +0000

“Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust.”

This was the first line of the highly impactful paper titled “Dynamo: Amazon’s Highly Available Key-value Store.” Published in 2007, it was written at a time when the status quo of database systems was not working for the massive explosion of internet-based applications. A team of computer engineers and scientists at Amazon completely re-thought the idea of data storage in terms of what would be needed for the future, with a firm footing in the computer science of the past. They were trying to solve an immediate problem but they had unwittingly sparked a huge revolution with distributed databases and the eventual collision with cloud-native applications.

The original cloud-native database

A year after the Dynamo paper, one of the authors, Avinash Lakshman, joined forces with Prashant Malik at Facebook and built one of the many implementations of Dynamo, called Cassandra. Because they worked at Facebook, they were facing scale problems very few companies were dealing with at the time. Another Facebook tenet in 2008: Move fast and break things. The reliability that was at the top of Amazon’s wish list for Dynamo? Facebook was challenging that daily with frenetic non-stop growth. Cassandra was built on the cloud-native principles of scale and self-healing—keeping the world's most important workloads at close to 100% uptime and having been tempered in the hottest scale fires. Now, with the release of Cassandra 4.0, we are seeing the beginning of what’s next for a proven database and the cloud-native applications that will be built in the future. The stage is set for a wide range of innovation—all built on the shoulders of the Dynamo giant.

The prima donna comes to Kubernetes

The previous generation of databases before the NoSQL revolution arguably drove a lot of innovation in the data center. It was typical to spend the most time and money on the “big iron” database server that was required to keep up with demand. We built some amazing palaces of data on bare metal, which made the pressure to virtualize database workloads difficult in the early 2000s. In most cases, database infrastructure sat on dedicated hardware next to the virtualized systems of the application. As cloud adoption grew, similar issues persisted. Ephemeral cloud instances worked great for web and app servers, but “commodity” was a terrible word for the precious database. The transition from virtualization to containerization only increased the cries of “never!” for database teams. Undaunted, Kubernetes moved forward with stateless workloads, and databases remained on the sidelines once again. Those days are now numbered. Technical debt can grow unbounded if left unchecked. Organizations don’t want multiple versions of infrastructure to manage—it requires hiring more people and keeping track of more stuff. When deploying virtual datacenters with Kubernetes, the database has to be a part of it.

Some objections are valid when it comes to running a database in a container. The reasons we built specialized hardware for databases are the same reasons we need to pay attention to certain parts of a containerized database. High-performance file systems. Placement of the system away from other containers that could create possible contention and reduce performance. With distributed databases like Apache Cassandra, placement of individual nodes in a way that hardware failure doesn’t impact database uptime. Databases that have proven themselves before Kubernetes are trying to find ways to run on Kubernetes. The future of databases and Kubernetes requires we replace the word “on” with “in” and the change has to happen on the database side. The current state of the art for “Runs on Kubernetes” is the use of operators to translate how databases want to work into what Kubernetes wants them to do. Our bright future of “Runs in Kubernetes” means databases use more of what Kubernetes offers with resource management and orchestration for basic operation of the database. Ironically, it means that many databases could remove entire parts of their code base as they hand that function to Kubernetes (reducing the surface area for bugs and potential security flaws).

Cassandra is ready for what’s next

The fact that Apache Cassandra 4.0 was recently released is a huge milestone for the project when it comes to stability and a mature codebase. The project is now looking forward to future Cassandra versions building on this solid foundation. Primarily, how can it support the larger ecosystem around it by becoming a rock-solid foundation for other data infrastructure? During the past decade, Cassandra has built a reputation as a highly performant and resilient database. With the types of modern cloud-native applications we need to write, we’ll only need more of that—interoperability will only become more important for Cassandra.

To think of what a cloud-native Cassandra would look like, we should look at how applications are deployed in Kubernetes. The notion of deploying a single monolith should be left rusting in the same pile that my old Sun E450 database server is in now. Cloud-native apps are modular and declarative and adhere to the principles of scale, elastic, and self-healing. They get their control and coordination from the Kubernetes cluster and participate with other parts of the application. The need for capacity is directly linked to the needs of the running application and everything is orchestrated with the total application. The virtual data center acts as a unit but can survive underlying hardware problems and works around them.

Ecosystem as a first-class

The future of Cassandra in Kubernetes isn’t about what it does alone. It's about what new capabilities it brings to the system as a whole. Projects like Stargate create a gateway for developers to build API-based applications without interacting with the underlying data store. Data as a service deployed by you, in your own virtual data center using Kubernetes. Cassandra itself may be using enabling projects such as OpenEBS to manage database class storage. Or Prometheus to store metrics. You may even find yourself using Cassandra without it being a part of your application. Projects like Temporal use Cassandra as the underlying storage for their persistence. When you have a data service that deploys easily, scales across multiple regions, it’s an obvious choice.

From the spark of innovation that started with the Dynamo paper at Amazon to the recent release of 4.0, Cassandra was destined to be the cloud-native database we all need. The next ten years of data on Kubernetes will see even more innovation as we take the once ivory palace of the database server and make it an equal player as a data service in the application stack. Cassandra is built for that future and ready to go with what is possibly the most stable database release ever in 4.0. If you are interested in joining the data on Kubernetes revolution, you can find an amazing community of like-minded individuals at the Data on Kubernetes Community. If you want to help make Cassandra the default Kubernetes data store, you can join us at the Cassandra project or more specifically the Cassandra on Kubernetes project, K8ssandra. If you are new to Cassandra, Astra DB is a great (free) place to learn with none of the infrastructure setup headaches.

New Survey Finds Data on Kubernetes Is No Longer a Pipe Dream

Patrick McFadin — Thu, 13 Jan 2022 17:25:01 +0000

``For people that work in infrastructure and application development, the pace of change is quick. Finish one project and it’s on to the next. Each iteration requires an evaluation asking if the right technology is being used and if it provides a new advantage. Kubernetes has been on the fast track of continuous evaluation. New projects and methodologies are continuously emerging and it can be hard to keep up. Then there is the question of running stateful services. The Data on Kubernetes community has released a report titled “Data on Kubernetes 2021” to give us a snapshot of where our industry sits with stateful workloads. Over 500 executives and tech leaders were asked some very direct and insightful questions about how they use Kubernetes. It turns out that there were a lot of surprising finds. Some that I would have never predicted. Let’s dig into some of the highlights that stood out to me.

One of the most important findings in the survey is how recently the trend of running stateful workloads on Kubernetes has turned from the minority to a majority. Not just a year ago most Kubernetes organizations were managing stateful in a separate environment. Microservices running in Kubernetes, database running on bare metal or VMs. Now in a major shift, a majority (70% of respondents) say they are embracing stateful workloads and are full steam ahead (Figure 1).

What happened? This is likely due to the meeting of two important factors. Kubernetes has undergone a lot of important changes to make stateful workloads a first-class workload, especially around storage such as Stateful Sets. The other is the general maturity in the industry when it comes to running any workloads in Kubernetes and the comfort level to bring critical workloads into the control plane. Every cloud now has a Kubernetes service and the proliferation of operators has lowered the barrier to entry for most organizations to go all-in on Kubernetes. That includes stateful data workloads. This is something we have witnessed at DataStax firsthand, as our customers increasingly are using Kubernetes for the entire stack.

Figure 1

On the downside, an age-old issue that has plagued fast-moving technology reared its head in the survey results.. The survey showed we are still trying to deal with finding qualified people to adopt new technology and experienced people to lead the way. Upgrading to newer technology takes resources and if older technology isn’t retired, the tail of support needs only grows longer. In many cases, company growth can support the expansion of teams to meet these needs but can run into a brick wall when it comes to hiring. Upgrading existing personnel can be a faster way to stay on top of your resourcing needs. The path I have been advocating is transitioning Database Administrators(DBA) to Site Reliability Engineer (SRE) for a win-win for everyone involved. I can’t think of a better group of professionals to take on the challenge of data on Kubernetes.

To help meet that demand, at DataStax we have been running free online workshops that attract thousands of DBAs as they upgrade their careers. We have also added more certification for both administering and developing applications using Apache Cassandra on Kubernetes. It’s great to see engineers taking advantage of this upgrade path and we hope to see the trend continue. This is a survey data point we’ll be watching closely in the coming years.

Beyond any need to upgrade technology for the sake of the latest thing, the survey showed that there were some very compelling business outcome reasons for being all-in on Kubernetes. The volume of digital transformation during the covid-19 pandemic has blown through any reasonable expectations. Organizations using Kubernetes reported that they were twice as productive after adoption. The aim of Kubernetes has been to reduce the toil for infrastructure engineers. Instead of provisioning hardware and installing software, engineers are describing what they need in containerized deployments.

Doing more with less is the dream of any technology leader trying to bring new or improved products to market. Any advantage of moving faster will keep you ahead of the competition that is most likely also moving fast. Those of you that have used Kubernetes can understand why this might be the case. Kubernetes handles a lot of what has traditionally been managed by entirely separate teams. A deployment creates a virtual data center that includes default secure networking with certificates, routes, and domain name entries. Much like how the automotive industry used robotics to increase the productivity of assembly lines, infrastructure automation via Kubernetes is meeting a similar promise.

The last point is one that we live every day here at DataStax. When we asked for the important factors about running stateful workloads on Kubernetes, the top answer respondents gave was ensuring consistency. Doing the same thing over and over with no deviation and, more importantly, no surprises! Those that have made the move to Kubernetes have learned that declarative infrastructure is a super-power. Defining the end state of your application and letting Kubernetes ensure that the state is met and maintained. This frees your team to work on the things that build your business by creating applications your customers love.

At DataStax we have first-hand knowledge of this from our cloud data service Astra DB. We rely on Kubernetes every day to run a scale Cassandra service with the consistency and reliability required by our customers. It’s safe to say that we couldn’t do what we do without Kubernetes. So if anyone asks if you can bet your business on stateful data run on Kubernetes, the survey (and DataStax) says… yes!

The Data on Kubernetes community is just getting started and we would love to have you there. We are looking for more organizations to participate and especially end users. Tell your story. Share your experience. Help others succeed. That’s the best part of a community.