<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Christopher Bradford</title>
    <description>The latest articles on DEV Community by Christopher Bradford (@bradfordcp).</description>
    <link>https://dev.to/bradfordcp</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F15395%2F472dc096-34fe-4f45-bf33-fe0281793b83.jpeg</url>
      <title>DEV Community: Christopher Bradford</title>
      <link>https://dev.to/bradfordcp</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/bradfordcp"/>
    <language>en</language>
    <item>
      <title>A Case for Databases on Kubernetes from a Former Skeptic</title>
      <dc:creator>Christopher Bradford</dc:creator>
      <pubDate>Thu, 02 Jun 2022 15:32:27 +0000</pubDate>
      <link>https://dev.to/datastax/a-case-for-databases-on-kubernetes-from-a-former-skeptic-5923</link>
      <guid>https://dev.to/datastax/a-case-for-databases-on-kubernetes-from-a-former-skeptic-5923</guid>
      <description>&lt;p&gt;Kubernetes is everywhere. Transactional apps, video streaming services and machine learning workloads are finding a home on this ever-growing platform. But what about databases? If you had asked me this question five years ago, the answer would have been a resounding “&lt;strong&gt;No!&lt;/strong&gt;” — based on my experience in development and operations. In the following years, as more resources emerged for stateful applications, my answer would have changed to “&lt;em&gt;Maybe,”&lt;/em&gt; but always with a qualifier: “It’s fine for development or test environments…” or “If the rest of your tooling is Kubernetes-based, and you have extensive experience…”&lt;/p&gt;

&lt;p&gt;But how about today? Should you run a database on Kubernetes? With complex operations and the requirements of persistent, consistent data, let’s retrace the stages in the journey to my current answer: “In a cloud native environment? &lt;strong&gt;Yes!&lt;/strong&gt;”&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Stage 1: Running Stateless Workloads on Kubernetes, But Not Databases!&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;When Kubernetes landed on the DevOps scene, I was keen to explore this new platform. My automation was already dialed in with Puppet configuring hosts and Capistrano shuffling my application bits to virtual servers. I had started exploring Docker containers and loved how I no longer had to install and manage services on my developer workstation. I could just fire up a few containers and continue changing the world with &lt;em&gt;my&lt;/em&gt; code.&lt;/p&gt;

&lt;p&gt;Kubernetes made it trivial to deploy these containers to a fleet of servers. It also handled replacing instances as they went down, and keeping a number of replicas online. No more getting paged at all hours! This was &lt;em&gt;great&lt;/em&gt; for stateless services, but what about databases? Kubernetes promised agility, but my databases were tied to a giant boat anchor of data. If I ran a database in a container, would my data be there when the container came back? I didn’t have time to solve this problem, so I fired up a managed RDBMS and moved on to the next feature ticket. Job done.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Stage 2: Running Ephemeral Databases on Kubernetes for Testing&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This question came up again when I needed to run separate instances of an application for QA testing per GitHub pull request (PR). Each PR needed a running app instance &lt;em&gt;and a database&lt;/em&gt;. We couldn’t just run against a shared database, since some of the PRs contained schema changes. I didn’t need a pretty solution, so we ran an instance of the RDBMS in the same &lt;em&gt;pod&lt;/em&gt; as the app; and pre-loaded the schema and some data. We tossed a reverse proxy in front of it and spun up the instances on-demand as needed. QA was happy as there was no more scheduling of PRs in the test environment, the product team enjoyed feature environments to test drive new functionality, and ops didn’t have to write a bunch of automation. This felt like a completely different situation to me, because I never expected these environments to be anything but ephemeral. It certainly wasn’t cloud native, so I still wasn’t ready to replace my managed database with a Kubernetes-deployed database in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Stage 3: Running Cassandra on Kubernetes StatefulSets&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Around this time, I was introduced to Apache Cassandra®. I was amazed by this high-performance database with a phenomenal operations story. A database that could support losing instances? Sign me up! My hopes of running a database on Kubernetes came roaring back. Could Cassandra deal with the ephemeral nature of containers? At the time, it felt like a begrudging “&lt;em&gt;I guess?&lt;/em&gt;“. It seemed possible, but there were significant gaps in the tooling. To take this to production, I’d need a team of Kubernetes &lt;em&gt;and&lt;/em&gt; Cassandra veterans, plus a suite of tooling and runbooks to fill in the operational gaps. It certainly seemed like a number of teams were successfully running Cassandra in containers. I fondly recall a webinar by Instaclustr talking about running &lt;a href="https://www.youtube.com/watch?v=rhqSmc9meMw" rel="noopener noreferrer"&gt;Cassandra on CoreOS&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In parallel, a number of Kubernetes ecosystem changes started to solidify. StatefulSets handle the creation of pods with persistent storage according to a predictable naming scheme. The persistent volume API and the container storage interface (CSI) allow for loose coupling between compute and storage. In some cases, it’s even possible to define storage that follows the application as it is rescheduled around the cluster.&lt;/p&gt;

&lt;p&gt;Storage is the core of every database. In a containerized database, data may be stored within the container itself or mounted externally. Using external storage makes it possible to switch the container out to change configuration or upgrade software, while keeping the data intact. Cassandra is already capable of leveraging high performance local storage, but the flexibility of modern CSI implementations means data volumes are moved to new workers as pods are rescheduled. This reduces the time to recovery, as data no longer has to be synced between hosts in the case of a worker failure.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Stage 4: A Kubernetes Operator for Cassandra&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;With straightforward deployment of Cassandra nodes to pods, resilient handling of data volumes and a Kubernetes control plane that works to keep everything running, what more could we ask for? At this point I encountered the collision of two separate distributed systems that have been developed independently from each other. The way Kubernetes provisions pods and starts services does &lt;strong&gt;not&lt;/strong&gt; align with the operational steps needed to care and feed for a Cassandra cluster — there’s a gap that must be bridged between Kubernetes workflows and Cassandra runbooks.&lt;/p&gt;

&lt;p&gt;Kubernetes provides a number of built-in resources — from a simple building block like a Pod, to higher-level abstractions such as a Deployment. These resources let users define their requirements, and Kubernetes provides control loops to ensure that the running state matches the target state. A control loop takes short incremental actions to nudge the orchestrated components towards a desired end state — such as restarting a pod, or creating a DNS entry. However, domains like distributed databases require more complex sequences of actions that don’t fit nicely within the predefined resources.This is great, but not everything fits nicely within a predefined resource.&lt;/p&gt;

&lt;p&gt;Kubernetes Custom Resources were created to allow the Kubernetes API to be extended for domain-specific logic, by defining new resource types and controllers. OSS frameworks like &lt;a href="https://sdk.operatorframework.io/" rel="noopener noreferrer"&gt;operator-sdk&lt;/a&gt;, &lt;a href="https://github.com/kubernetes-sigs/kubebuilder" rel="noopener noreferrer"&gt;kubebuilder&lt;/a&gt; and &lt;a href="https://juju.is/" rel="noopener noreferrer"&gt;juju&lt;/a&gt; were created to simplify the creation of custom resources and their controllers. Tools built with these frameworks came to be known as Operators.&lt;/p&gt;

&lt;p&gt;As these powerful new tools became available, I joined the effort to codify the Cassandra logical domain and operational runbooks in the cass-operator project. Cass-operator defines the CassandraDatacenter custom resource and provides the glue between projects including the management API, cass-config-builder and others, to provide a cohesive Cassandra experience on Kubernetes.&lt;/p&gt;

&lt;p&gt;With cass-operator, we spend less time thinking about pods, stateful sets, persistent volumes, or even the tedious tasks of bootstrapping and scaling clusters, and more time thinking about our applications.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Stage Now: Running a Full Data Platform with K8ssandra&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The next iteration in this cycle, &lt;a href="https://k8ssandra.io/" rel="noopener noreferrer"&gt;K8ssandra&lt;/a&gt;, elevates us further away from the individual components. Instead of looking at the Cassandra Datacenters, we can consider our data platform holistically: not just the database, but also supporting services including monitoring, backups and APIs. We can ask Kubernetes for a data platform by executing a simple Helm install command; and a suite of operators kick in to provision and manage all of the pieces.&lt;/p&gt;

&lt;p&gt;Looking back at the pitfalls of running databases on Kubernetes I encountered several years ago, most of them have been resolved. Starting with a foundational technology like Cassandra takes care of our availability concerns: data is replicated and it’s smart enough to deal with shuffling data around as peers come and go. The Kubernetes API has matured to include custom resources and advanced stateful components (like persistent volumes and stateful sets). Cass-operator acts as a Rosetta Stone, providing the wealth of knowledge needed to stitch the terms of Cassandra and Kubernetes together. Finally, K8ssandra takes us to the next level with a complete cohesive experience.&lt;/p&gt;

&lt;p&gt;All of these problems are &lt;strong&gt;&lt;em&gt;hard&lt;/em&gt;&lt;/strong&gt; and require technical finesse and careful thinking. Without choosing the right pieces, we’ll end up resigning both databases and Kubernetes to niche roles in our infrastructure, as well as the innovative engineers who have invested so much effort in building out all of these pieces and runbooks. Fortunately each of these problems has been met and bested. Should you run your database in Kubernetes? &lt;em&gt;Definitely.&lt;/em&gt;  If you'd like to play with Cassandra quickly off K8s, try the managed &lt;a href="https://astra.dev/3N0RQns" rel="noopener noreferrer"&gt;DataStax Astra DB&lt;/a&gt;, which is built on Apache Cassandra.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Managing Distributed Applications in Kubernetes Using Cilium and Istio with Helm and Operator for Deployment</title>
      <dc:creator>Christopher Bradford</dc:creator>
      <pubDate>Fri, 01 Apr 2022 19:01:47 +0000</pubDate>
      <link>https://dev.to/datastax/managing-distributed-applications-in-kubernetes-using-cilium-and-istio-with-helm-and-operator-for-deployment-428g</link>
      <guid>https://dev.to/datastax/managing-distributed-applications-in-kubernetes-using-cilium-and-istio-with-helm-and-operator-for-deployment-428g</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0le6ipr8gn962xie1irv.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0le6ipr8gn962xie1irv.jpeg" alt="Image description" width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This post will show you the benefits of managing your distributed applications with Kubernetes in cross-cloud, multi-cloud, and hybrid cloud scenarios using Cilium and Istio with Helm and Operator for deployment.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In our recent post on &lt;a href="https://thenewstack.io/taking-your-database-beyond-a-single-kubernetes-cluster/" rel="noopener noreferrer"&gt;The New Stack&lt;/a&gt;, we showed you how you can leverage &lt;a href="https://kubernetes.io/" rel="noopener noreferrer"&gt;Kubernetes&lt;/a&gt; (K8s) and &lt;a href="https://cassandra.apache.org/_/index.html" rel="noopener noreferrer"&gt;Apache Cassandra&lt;/a&gt;TM to manage distributed applications at scale, with thousands of nodes across both on-premises and in the cloud. In that example, we used &lt;a href="https://dtsx.io/3pgqEIe" rel="noopener noreferrer"&gt;K8ssandra&lt;/a&gt; and &lt;a href="https://cloud.google.com/" rel="noopener noreferrer"&gt;Google Cloud Platform&lt;/a&gt; (GCP) to illustrate some of the challenges you might expect to encounter as you grow into a multi-cloud environment, upgrade to another K8s version, or begin working with different distributions and complimentary tooling. In this post, we’ll explore a few alternative approaches to using K8s to help you more easily manage distributed applications.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.cncf.io/" rel="noopener noreferrer"&gt;Cloud Native Computing Foundation&lt;/a&gt; (CNCF) provides many different options for managing your distributed applications. And, there are many open-source projects out there, that has come a long way in helping to alleviate some of the pain points for developers working in the cross-cloud, multi-cloud, and hybrid cloud scenarios.&lt;/p&gt;

&lt;p&gt;In this post, we’ll focus on two additional approaches that we think are very good:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Using a container network interface (&lt;a href="https://cilium.io/" rel="noopener noreferrer"&gt;Cilium&lt;/a&gt;) and service mesh (&lt;a href="https://istio.io/" rel="noopener noreferrer"&gt;Istio&lt;/a&gt;) on top of your K8s infrastructure to more easily manage your distributed applications.&lt;/li&gt;
&lt;li&gt;Using &lt;a href="https://helm.sh/" rel="noopener noreferrer"&gt;Helm&lt;/a&gt; and the &lt;a href="https://github.com/operator-framework" rel="noopener noreferrer"&gt;Operator Framework&lt;/a&gt; to deploy them in a cloud-native way.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Running Istio and Cilium side by side
&lt;/h1&gt;

&lt;p&gt;In &lt;a href="https://thenewstack.io/taking-your-database-beyond-a-single-kubernetes-cluster/" rel="noopener noreferrer"&gt;our first post&lt;/a&gt; on the topic of how to leverage K8s and Cassandra to manage distributed applications at scale, we discussed the use of DNS stubs to handle routing between our Cassandra data centers. However, another approach is to run a mix of global Istio services and Cilium global services side by side.&lt;/p&gt;

&lt;p&gt;Cilium provides a single zone of connectivity (a control plane) that facilitates the management and orchestration of applications across the cloud environment. Istio is an open-source, language-independent service networking layer (a service mesh) that supports communication and data sharing between different microservices within a cloud environment.&lt;/p&gt;

&lt;p&gt;Cilium’s global services are reachable from all Istio managed services as they can be discovered via DNS just like regular services. The pod IP routing is the foundation of the multi-cluster ability. It allows pods across clusters to reach each other via their pod IPs. Cilium can operate in several modes to perform pod IP routing. All of them are capable of performing multi-cluster pod IP routing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2sp33u2892lizsgjrgzk.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2sp33u2892lizsgjrgzk.jpg" alt="Image description" width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Figure 1: Cilium control plane for managing and orchestrating applications across the cloud environment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fedc61jxvegj7dob0p12h.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fedc61jxvegj7dob0p12h.jpg" alt="Image description" width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Figure 2: Istio service networking layer (service mesh) to support communication and data sharing between different microservices within the cloud environment.&lt;/p&gt;

&lt;p&gt;You may already be using one of these tools. If you are, you can add one on top of the other to extend their benefits. For example, if you already have Istio deployed, you can add Cilium on top of it. Pod IP routing is the foundation of multi-cluster capabilities, and both of these tools provide that functionality today. The goal here is to streamline pod-to-pod connectivity and ensure that they’re able to perform multi-cluster IP routing.&lt;/p&gt;

&lt;p&gt;We can do this with overlay networks, in which we can tunnel all of this through encapsulation. With overlay networks, you can build out a separate IP address space for your application, which in our example &lt;a href="https://thenewstack.io/taking-your-database-beyond-a-single-kubernetes-cluster/" rel="noopener noreferrer"&gt;here&lt;/a&gt; is a Cassandra database. Then you would run that on top of the existing Kube network leveraging proxies, sidecars, and gateways. We won’t go too far into that in this post, but we have some great content on &lt;a href="https://dtsx.io/3vvs9TZ" rel="noopener noreferrer"&gt;how to connect stateful workloads across K8s clusters&lt;/a&gt; that will show you at a high level how to do that.&lt;/p&gt;

&lt;p&gt;Tunneling mode in Cilium &lt;a href="https://docs.cilium.io/en/v1.8/concepts/networking/routing/" rel="noopener noreferrer"&gt;encapsulates&lt;/a&gt; all network packets emitted by pods in a so-called encapsulation header. The encapsulation header can consist of a &lt;a href="https://en.wikipedia.org/wiki/Virtual_Extensible_LAN" rel="noopener noreferrer"&gt;VXLAN&lt;/a&gt; or &lt;a href="https://en.wikipedia.org/wiki/Generic_Network_Virtualization_Encapsulation" rel="noopener noreferrer"&gt;Geneve frame&lt;/a&gt;. This encapsulation frame is then transmitted via a standard &lt;a href="https://en.wikipedia.org/wiki/User_Datagram_Protocol" rel="noopener noreferrer"&gt;User Datagram Protocol&lt;/a&gt; (UDP) packet header. The concept is similar to a VPN tunnel.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Advantage:&lt;/strong&gt; The pod IPs are never visible on the underlying network. So, you get the benefit of encryption. The network only sees the IP addresses of the worker nodes. This can simplify installation and firewall rules.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Disadvantage:&lt;/strong&gt; The additional network headers required will reduce the theoretical maximum throughput of the network. The exact cost will depend on the configured maximum transmission unit (MTU) and will be more noticeable when using a traditional MTU of 1500 compared to the use of jumbo frames at MTU 9000.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Disadvantage:&lt;/strong&gt; In order to not cause excessive CPU, the entire networking stack including the underlying hardware has to support checksum and segmentation offload to calculate the checksum and perform the segmentation in hardware just as it is done for “regular” network packets. Availability of this offload functionality is very common these days.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The takeaway message here is really that there are a lot of options that exist in the container networking interface (CNI) space and with service mesh and discovery that can help to eliminate most if not all of the heavy lifting around DNS service discovery and ensuring end-to-end connectivity, you need to effectively manage your distributed applications.&lt;/p&gt;

&lt;p&gt;These products not only provide all of that functionality bundled up into a single solution (or maybe a couple of solutions), but they also offer some pretty big additional benefits over simply using DNS stubs. With DNS stubs, you still have to manually configure your DNS and IP routing, map it all out and document it, and then automate and orchestrate it all. Whereas, these products offer observability, ease of management, and most importantly, a Zero Trust architecture, which would be nearly impossible to achieve with a DNS-only based solution.&lt;/p&gt;

&lt;h1&gt;
  
  
  Added Benefits
&lt;/h1&gt;

&lt;p&gt;Cilium has done a great job creating a plug-in architecture that runs on top of &lt;a href="https://ebpf.io/" rel="noopener noreferrer"&gt;eBPF&lt;/a&gt;. This provides application-level visibility that allows you to start creating policies that go beyond what you may have seen or leveraged before. For example, say you want to create a firewall rule to ensure that your application can only talk to a specific Cassandra server. You can actually now take that down a few notches to create a rule that allows read-only access or restricts access to specific records or tables. That’s just not something that’s possible with the existing tooling we’ve used in the past, whether that’s VPNs and Firewalls.&lt;/p&gt;

&lt;p&gt;The other thing is that all of this has created a lot of complexity and “Kubeception” around layers upon layers of overlay networks. So, it can be challenging to ensure you have visibility and to properly instrument everything, especially if you’re managing DNS on your own. You’ll also have to start collecting logs, gathering metrics, creating dashboards, and doing other things that together add a lot of additional overhead.&lt;/p&gt;

&lt;p&gt;However, if you look at projects like &lt;a href="https://github.com/cilium/hubble" rel="noopener noreferrer"&gt;Cilium Hubble&lt;/a&gt; and &lt;a href="https://github.com/istio/istio/tree/master/galley" rel="noopener noreferrer"&gt;Istio Galley&lt;/a&gt;, you can see that you not only get all the instrumentation to manage this stuff out of the box, but you also get observability into the health of your pods and fine-grained visibility that you won’t get with traditional tools.&lt;/p&gt;

&lt;p&gt;This observability is a huge advantage because it allows you to also instrument on the monitoring side to build out powerful metrics reporting with tools that can tightly integrate with &lt;a href="https://prometheus.io/" rel="noopener noreferrer"&gt;Prometheus&lt;/a&gt;. Once you do this, you can get metric data on the connectivity between all of your pods and applications and determine where there may be latency as well as what policy is potentially being impacted.&lt;/p&gt;

&lt;p&gt;Of course, the ability to instrument all this isn’t new. We’ve probably all been there and done that, collecting logs to some central log aggregator, building custom searches, etc. But with these services, we can now get this out of the box.&lt;/p&gt;

&lt;h1&gt;
  
  
  Deployment with Helm and the Operator Framework
&lt;/h1&gt;

&lt;p&gt;So how do we get from all the great things we’ve talked about in these slides to actually deploying your applications into a cloud, multi-cloud or hybrid cloud environment?&lt;/p&gt;

&lt;p&gt;Since you’re no longer working in a single region or cluster anymore, there’s going to be a bit of juggling involved. You might be pushing manifest and resources to each cluster one by one. Or maybe you’re templating things out and using tools like Helm or perhaps some GitOps or other pipeline tools to make sure that you are staging appropriately and you’re working through different environments. But really, there’s still a lot more that is required when you’re working on multi-cluster deployments.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fik106j45r3hi8i3mgzih.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fik106j45r3hi8i3mgzih.png" alt="Image description" width="800" height="259"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So one example here is &lt;a href="https://helm.sh/" rel="noopener noreferrer"&gt;Helm&lt;/a&gt;. If you’re using Helm, you’re going to have a release per cluster, which means you’re going to have to maintain and manage to switch between those various contacts and make sure you’re upgrading the right way. And in case things go sideways, you’ll also need to know how to stage a change or roll back a change before you switch over and do operations in the other cluster or the other region. And when you go beyond two regions, there’s even a bit more complexity.&lt;/p&gt;

&lt;p&gt;Now I’d like to call out the &lt;a href="https://operatorframework.io/" rel="noopener noreferrer"&gt;Operator Framework&lt;/a&gt; here, and more specifically the &lt;a href="https://sdk.operatorframework.io/" rel="noopener noreferrer"&gt;Operator SDK&lt;/a&gt; and the individual operators that make up a number of the things we’ve covered here.&lt;/p&gt;

&lt;p&gt;Some of these tools are really starting to level up with multi-cluster functionality where in some cases you’re running instances of their operator inside of each of the clusters, and they communicate and lock and perform when they go to perform various actions. In other cases, you might have a control plane where you’re running the operator and it’s reconciling resources in the downstream clusters.&lt;/p&gt;

&lt;p&gt;Maybe we have an Ops K8s cluster, or maybe just &lt;a href="https://cloud.google.com/about/locations#network" rel="noopener noreferrer"&gt;us-west4&lt;/a&gt; is running the operator, but it’s communicating with the &lt;a href="https://www.redhat.com/en/topics/containers/what-is-the-kubernetes-API" rel="noopener noreferrer"&gt;Kube API&lt;/a&gt; and &lt;a href="https://cloud.google.com/about/locations#americas" rel="noopener noreferrer"&gt;us-east1&lt;/a&gt;. We’re currently doing that in the K8ssandra project where we’re going from Helm charts to an operator that has Kube configs and serves the confidentials to talk to remote API servers and to reconcile resources across those boundaries. We do this because some operations need to happen serially.&lt;/p&gt;

&lt;p&gt;Maybe if a node is down in one data center and we don’t want to do a certain operation in another data center, having operators that can communicate across those cluster boundaries can be really advantageous, especially when you’re talking about orchestration.&lt;/p&gt;

&lt;h1&gt;
  
  
  Spare yourself some pain by planning your deployment
&lt;/h1&gt;

&lt;p&gt;The conversation we started on &lt;a href="https://thenewstack.io/taking-your-database-beyond-a-single-kubernetes-cluster/" rel="noopener noreferrer"&gt;The New Stack blog&lt;/a&gt; and have continued here has focused a lot on manually managing things versus having cloud-native technologies that can manage them for us, whether that be service discovery or routing tables, or even just adjusting the packet in flight to indicate what cluster they need to go to and eventually, what pod they need to reach.&lt;/p&gt;

&lt;p&gt;When you think through the application of these technologies and how you might best use them to manage your distributed applications, the single most important takeaway we’d like to leave you with is…&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;You need to plan your deployments before you start spinning up your K8s clusters.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Having the right people together to hash out your approach before you wade in will help you identify any limits in your system and other important factors that need to be considered. For example, maybe you have a scarcity of IP addresses. Maybe you’re running one big cluster, and now you’re talking about many small clusters. Or maybe you run clusters more along business lines or for certain Ops teams.&lt;/p&gt;

&lt;p&gt;How are you going to start to venture into this multi-cluster multi-region space and ultimately, how are you going to build the plumbing and the pipes between those systems so they can communicate with each other?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa9afi5sujn4ftqvgv8rm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa9afi5sujn4ftqvgv8rm.png" alt="Image description" width="800" height="377"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Theoretically, a single team could do this planning. But, that’s probably not going to turn out well. It’s far more likely that you’ll need to involve several teams, including people from operations and people that run the cloud accounts. If you’re operating in a hybrid or multi-cloud environment, you’ll probably also have some network people involved, too. For example, there may be some firewalls that need to be adjusted in certain ways.&lt;/p&gt;

&lt;p&gt;Planning your approach upfront is enormously beneficial and will help you avoid some pretty big problems when you move into implementation. For example, it can be very difficult to make changes once you’ve launched your cluster because you can’t just change the &lt;a href="https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing" rel="noopener noreferrer"&gt;Classless Inter-Domain Routing&lt;/a&gt; (CIDR) (the IP address space) your pods are running in at that point. You would instead need to migrate them. By doing some of this planning upfront, you can avoid this and a lot of other unfortunate situations.&lt;/p&gt;

&lt;p&gt;Curious to learn more about (or play with) Cassandra itself? We recommend trying it on the &lt;a href="https://astra.dev/3r9ONAz" rel="noopener noreferrer"&gt;Astra DB&lt;/a&gt; free plan for the fastest setup.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow the &lt;a href="https://dtsx.io/3E7RiHF" rel="noopener noreferrer"&gt;DataStax Tech Blog&lt;/a&gt; for more developer stories. Check out our &lt;a href="https://dtsx.io/3AZohMk" rel="noopener noreferrer"&gt;YouTube&lt;/a&gt; channel for tutorials and here for DataStax Developers on &lt;a href="https://dtsx.io/3AZohMk" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt; for the latest news about our developer community.&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Resources
&lt;/h1&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://thenewstack.io/taking-your-database-beyond-a-single-kubernetes-cluster/" rel="noopener noreferrer"&gt;Taking Your Database Beyond a Single Kubernetes Cluster&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://kubernetes.io/" rel="noopener noreferrer"&gt;Kubernetes&lt;/a&gt; (K8s)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cassandra.apache.org/_/index.html" rel="noopener noreferrer"&gt;Apache Cassandra&lt;/a&gt;TM&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dtsx.io/3pgqEIe" rel="noopener noreferrer"&gt;K8ssandra&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/" rel="noopener noreferrer"&gt;Google Cloud Platform&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.cncf.io/" rel="noopener noreferrer"&gt;The Cloud Native Computing Foundation&lt;/a&gt; (CNCF)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cilium.io/" rel="noopener noreferrer"&gt;Cilium&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.cilium.io/en/v1.8/concepts/networking/routing/" rel="noopener noreferrer"&gt;Cilium Docs: Routing and Encapsulation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.cilium.io/en/v1.8/gettingstarted/cassandra/" rel="noopener noreferrer"&gt;Cilium Guides: How to Secure a Cassandra Database&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cilium.io/blog/2019/03/12/clustermesh" rel="noopener noreferrer"&gt;Deep Dive into Cilium Multi-Cluster&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://istio.io/" rel="noopener noreferrer"&gt;Istio&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://istio.io/latest/docs/setup/install/multicluster/" rel="noopener noreferrer"&gt;Istio Multi-cluster Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://helm.sh/" rel="noopener noreferrer"&gt;Helm Charts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/operator-framework" rel="noopener noreferrer"&gt;Operator Framework&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dtsx.io/3vvs9TZ" rel="noopener noreferrer"&gt;How to Connect Stateful Workloads Across Kubernetes Clusters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://en.wikipedia.org/wiki/Virtual_Extensible_LAN" rel="noopener noreferrer"&gt;Virtual Extensible LAN&lt;/a&gt; (VXLAN)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://en.wikipedia.org/wiki/Generic_Network_Virtualization_Encapsulation" rel="noopener noreferrer"&gt;Generic Network Virtualization Encapsulation &lt;/a&gt;(Geneve)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://en.wikipedia.org/wiki/User_Datagram_Protocol" rel="noopener noreferrer"&gt;User Datagram Protocol&lt;/a&gt; (UDP)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing" rel="noopener noreferrer"&gt;Classless Inter-Domain Routing&lt;/a&gt; (CIDR)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ebpf.io/" rel="noopener noreferrer"&gt;eBPF&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/cilium/hubble" rel="noopener noreferrer"&gt;Cilium Hubble GitHub Repo&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/istio/istio/tree/master/galley" rel="noopener noreferrer"&gt;Istio Galley GitHub Repo&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://prometheus.io/" rel="noopener noreferrer"&gt;Prometheus&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://operatorframework.io/" rel="noopener noreferrer"&gt;Operator Framework&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://sdk.operatorframework.io/" rel="noopener noreferrer"&gt;Operator SDK&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.redhat.com/en/topics/containers/what-is-the-kubernetes-API" rel="noopener noreferrer"&gt;What is the Kubernetes API?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/about/locations#network" rel="noopener noreferrer"&gt;Global Locations — Regions &amp;amp; Zones&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
    </item>
    <item>
      <title>Taking Your Database Beyond a Single Kubernetes Cluster</title>
      <dc:creator>Christopher Bradford</dc:creator>
      <pubDate>Wed, 30 Mar 2022 12:37:58 +0000</pubDate>
      <link>https://dev.to/datastax/taking-your-database-beyond-a-single-kubernetes-cluster-59gc</link>
      <guid>https://dev.to/datastax/taking-your-database-beyond-a-single-kubernetes-cluster-59gc</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6iq67ou33ynu0tqzf2ez.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6iq67ou33ynu0tqzf2ez.jpg" alt="Image description" width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By &lt;a href="https://www.linkedin.com/in/bradfordcp/" rel="noopener noreferrer"&gt;Christopher Bradford&lt;/a&gt; and &lt;a href="https://www.linkedin.com/in/ty-morton-2b55b82/" rel="noopener noreferrer"&gt;Ty Morton&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Global applications need a data layer that is as distributed as the users they serve. &lt;a href="https://cassandra.apache.org/_/index.html" rel="noopener noreferrer"&gt;Apache Cassandra&lt;/a&gt; has risen to this challenge, handling data needs for the likes of Apple, Netflix and Sony. Traditionally, managing data layers for a distributed application was handled with dedicated teams to manage the deployment and operations of thousands of nodes — both on-premises and in the cloud.&lt;/p&gt;

&lt;p&gt;To alleviate much of the load felt by DevOps teams, we evolved a number of these practices and patterns in &lt;a href="https://k8ssandra.io/" rel="noopener noreferrer"&gt;K8ssandra&lt;/a&gt;, leveraging the common control plane afforded by &lt;a href="https://kubernetes.io/" rel="noopener noreferrer"&gt;Kubernetes&lt;/a&gt; (K8s) There has been a catch though — running a database (or indeed any application) across multiple regions or K8s clusters is tricky without proper care and planning up front.&lt;/p&gt;

&lt;p&gt;To show you how we did this, let’s start by looking at a single region K8ssandra deployment running on a lone K8s cluster. It is made up of six Cassandra nodes spread across three availability zones within that region, with two Cassandra nodes in each availability zone. In this example, we’ll use the &lt;a href="https://cloud.google.com/" rel="noopener noreferrer"&gt;Google Cloud Platform&lt;/a&gt; (GCP) zone name. However, our example here could just as easily apply to other clouds or even on-prem.&lt;/p&gt;

&lt;p&gt;Here’s where we are now:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc23klqa6krcwnhbh5d4i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc23klqa6krcwnhbh5d4i.png" alt="Image description" width="450" height="376"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Existing deployment of our cloud database.&lt;/p&gt;

&lt;p&gt;The goal is to have two regions, each with a Cassandra data center. In our cloud-managed K8s deployment here, this translates to two K8s clusters — each with a separate control plane, but utilizing a common virtual private cloud (VPC) network. By expanding our Cassandra cluster into multiple data centers, we have redundancy in case of a regional outage, as well as improved response times and latencies to our client applications given local access to data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkqwr0mmvydd8ykevs1g9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkqwr0mmvydd8ykevs1g9.png" alt="Image description" width="800" height="335"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is our goal: to have two regions, each with their own Cassandra data center.&lt;/p&gt;

&lt;p&gt;On the surface, it would seem like we could achieve this by simply spinning up another K8s cluster deploying the same K8s &lt;a href="https://www.redhat.com/en/topics/automation/what-is-yaml" rel="noopener noreferrer"&gt;YAML&lt;/a&gt;. Then just add a couple tweaks for &lt;a href="https://cloud.google.com/about/locations#network" rel="noopener noreferrer"&gt;Availability Zone&lt;/a&gt; names and we can call it done, right? Ultimately the shape of the resources is &lt;em&gt;very&lt;/em&gt; similar, and it’s all K8s objects. So, shouldn’t this just work? Well, &lt;em&gt;maybe&lt;/em&gt;. Depending on your environment, this approach &lt;em&gt;might&lt;/em&gt; work.&lt;/p&gt;

&lt;p&gt;If you’re really lucky, you may be a firewall rule away from a fully distributed database deployment. Unfortunately, it’s rarely that simple. Even if some of these hurdles are easily cleared, there are plenty of other innocuous things that can go wrong and lead to a degraded state. Your choice of cloud provider, K8s distro, command-line flag, and yes, even DNS — these can all potentially lead you down a dark and stormy path. So, let’s explore some of the most common issues you might run into, so you can avoid them.&lt;/p&gt;

&lt;h1&gt;
  
  
  Common hurdles on the race to scale
&lt;/h1&gt;

&lt;p&gt;Even if some of your deployment seems to be working well initially, you will likely encounter a hurdle or two as you grow into a multicloud environment, upgrade to another K8s version, or begin working with different distributions and complimentary tooling. When it comes to distributed databases there’s a lot more under the hood. Understanding what K8s is doing to enable running containers across a fleet of hardware will help you develop advanced solutions — and ultimately, something that fits your exact needs.&lt;/p&gt;

&lt;h1&gt;
  
  
  The need for unique IP addresses for your Cassandra nodes
&lt;/h1&gt;

&lt;p&gt;One of the first hurdles you might run into involves basic networking. Going back to our first cluster, let’s take a look at the layers of networking involved.&lt;/p&gt;

&lt;p&gt;In our VPC shown below, we have a Classless Inter-Domain Routing (CIDR) range representing the addresses for the K8s worker instances. Within the scope of the K8s cluster there is a separate address space where pods operate and containers run. A pod is a collection of containers that have shared resources — such as storage, networking, and process space.&lt;/p&gt;

&lt;p&gt;In some cloud environments, these subnets are tied to specific availability zones. So, you might have a CIDR range for each subnet your K8s workers are launched into. You may also have other virtual machines within your VPC, but in this example we’ll stick with K8s being the only tenant.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzn5v5toj9bqp997u2f68.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzn5v5toj9bqp997u2f68.png" alt="Image description" width="565" height="295"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;CIDR ranges used by a VPC with a K8s layer&lt;/p&gt;

&lt;p&gt;In our example, we have 10.100.x.x for the nodes and 10.200.x.x for the K8s level. Each of the K8s workers gets a slice of the 10.200.x.x CIDR range for the pods that are running on that individual instance.&lt;/p&gt;

&lt;p&gt;Thinking back to our target structure, what happens if both clusters utilize the same or overlapping CIDR address ranges? You may remember these error messages when first getting into networking:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvbs8jbly87zo6oetyl8h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvbs8jbly87zo6oetyl8h.png" alt="Image description" width="800" height="217"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Common error messages when trying to connect two networks.&lt;/p&gt;

&lt;p&gt;Errors don’t look like this with K8s. You don’t have an alert that pops up warning you that your clusters cannot effectively communicate.&lt;/p&gt;

&lt;p&gt;If you have a cluster that has one IP space, and then you have another cluster for the same IP space or where they overlap, how does each cluster know when a particular packet needs to leave its address space and instead route through the VPC network to the other cluster, and then into that cluster’s network?&lt;/p&gt;

&lt;p&gt;By default there really is no hint here. There are some ways around this; but at a high level, if you’re overlapping, you’re asking for a bad time. The point here is that you need to understand your address space for each cluster and then carefully plan the assignment and usage of those IPs. This allows for the Linux kernel (where K8s routing happens) and the VPC network layer to forward and route packets as appropriate.&lt;/p&gt;

&lt;p&gt;But, what if you don’t have enough IPs? In some cases, you can’t give every pod its own IP address. So, in this case, you would need to take a step back and determine what services absolutely must have a unique address and what services can be running together in the same address space. For example, if your database here needs to be able to talk to each and every other pod, it probably needs its own unique address. But if your application tiers in the East Coast and in the West Coast are just talking to their local data layer, they can have their own dedicated K8s clusters with the same address range and avoid conflict.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3htabed66ain2r3727fb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3htabed66ain2r3727fb.png" alt="Image description" width="800" height="833"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Flattening out the network.&lt;/p&gt;

&lt;p&gt;In our reference deployment, we dedicated non-overlapping ranges in K8s clusters for the layers of infrastructure that MUST be unique and overlapping CIDR ranges where services will not communicate. Ultimately, what we’re doing here is flattening out the network.&lt;/p&gt;

&lt;p&gt;With non-overlapping IP ranges, we can now move on to routing packets to pods in each cluster. In the figure above, you can see the West Coast is 10.100, and the East Coast is 10.150, with the K8s pods receiving IPs from those ranges. The K8s clusters have their own IP space, 200 versus 250, and the pods are sliced off just like they were previously.&lt;/p&gt;

&lt;h1&gt;
  
  
  How to handle routing between the Cassandra data centers
&lt;/h1&gt;

&lt;p&gt;So, we have a bunch of IP addresses and we have uniqueness to those addresses. Now, how do we handle the routing of this data and the communication and discovery of all of this? There’s no way for the packets destined for cluster A to know how they need to be routed to cluster B. When we attempt to send a packet across cluster boundaries, the local Linux networking stack sees that this is not local to this host or any of the hosts within the local K8s cluster. It then forwards the packet on to the VPC network. From here, our cloud provider must have a routing table entry to understand where this packet needs to go.&lt;/p&gt;

&lt;p&gt;In some cases this will just work out of the box. The VPC routing table is updated with the pod and service CIDR ranges, informing which hosts packets should be routed. In other environments, including hybrid and on-premises, this may take the form of advertising the routes via BGP to the networking layer. Yahoo! Japan has a great &lt;a href="https://kubernetes.io/blog/2016/10/kubernetes-and-openstack-at-yahoo-japan/" rel="noopener noreferrer"&gt;article&lt;/a&gt; covering this exact deployment method.&lt;/p&gt;

&lt;p&gt;However, these options might not always be the best answer, depending on what your multi-cluster architecture looks like within a single cloud provider. Is it hybrid- or multi-cloud, with a combination of on-prem, with two different cloud providers? While you could certainly instrument all that across all those different environments, you can count on it requiring a lot of time and upkeep.&lt;/p&gt;

&lt;h1&gt;
  
  
  Some solutions to consider
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Overlay networks
&lt;/h2&gt;

&lt;p&gt;An easier answer is to use overlay networks, in which you build out a separate IP address space for your application — which, in this case, is a Cassandra database. Then you would run that on top of the existing Kube network leveraging proxies, sidecars and gateways. We won’t go too far into that in this post, but we have some great content on &lt;a href="https://dtsx.io/3l8sMPN" rel="noopener noreferrer"&gt;how to connect stateful workloads across K8s clusters&lt;/a&gt; that will show you at a high level how to do that.&lt;/p&gt;

&lt;p&gt;So, what’s next? Packets are flowing, but now you have some new K8s shenanigans to deal with. Assuming that you get the network in place and have all the appropriate routing, some connectivity between these clusters exists, at least at an IP layer. You have IP connectivity pods and Cluster 1 can talk to Pods and Cluster 2, but you now also have some new things to think about.&lt;/p&gt;

&lt;h2&gt;
  
  
  Service discovery
&lt;/h2&gt;

&lt;p&gt;With a K8s network, identity is transient. Due to cluster events, a pod may be rescheduled and receive a new network address. In some applications this isn’t a problem. In others, like databases, the network address is the identity — which can lead to unexpected behavior. Even though IP addresses may change, over time our storage and thus the data each pod represents stays persistent. We must have a way to maintain a mapping of addresses to applications. This is where service discovery enters the fold.&lt;/p&gt;

&lt;p&gt;In most circumstances service discovery is implemented via DNS within K8s. Even though a pod’s IP address may change, it can have a persistent DNS-based identity that is updated as cluster events occur. This sounds great, but when we enter the world of multi-cluster we have to ensure that our services are discoverable across cluster boundaries. As a pod in Cluster 1, I &lt;em&gt;should&lt;/em&gt; be able to get the address for a pod in Cluster 2.&lt;/p&gt;

&lt;h2&gt;
  
  
  DNS stubs
&lt;/h2&gt;

&lt;p&gt;One approach to this conundrum is DNS stubs. In this configuration we configure the K8s DNS services to route requests for a specific domain suffix to our remote cluster(s). With a fully qualified domain name, we can then forward the DNS lookup request to the appropriate cluster for resolution and ultimately routing.&lt;/p&gt;

&lt;p&gt;The gotcha here is that each cluster requires a separate DNS suffix set through a kubelet flag, which isn’t an option in all flavors of K8s. Some users work around this by using namespace names as part of the FQDN to configure the stub. This works, but is a little bit of a hack instead of setting up proper cluster suffixes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Managed DNS
&lt;/h2&gt;

&lt;p&gt;Another solution similar to DNS stubs is to use a managed DNS product. In the case of GCP there is the &lt;a href="https://cloud.google.com/dns" rel="noopener noreferrer"&gt;Cloud DNS&lt;/a&gt; product, which handles replicating local DNS entries up to the VPC level for resolution by outside clusters, or even virtual machines within the same VPC. This option offers a lot of benefits, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Removing the overhead of managing the cluster-hosted DNS server — Cloud DNS requires no scaling, monitoring, or managing of DNS instances, because it is a hosted Google service.&lt;/li&gt;
&lt;li&gt;Local resolution of DNS queries on each Google K8s Engine (GKE) node — Similar to NodeLocal DNSCache, Cloud DNS caches DNS responses locally, providing low latency and high scalability DNS resolution.&lt;/li&gt;
&lt;li&gt;Integration with &lt;a href="https://cloud.google.com/stackdriver/docs" rel="noopener noreferrer"&gt;Google Cloud’s operations suite&lt;/a&gt; — This provides for DNS monitoring and logging.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cloud.google.com/kubernetes-engine/docs/how-to/cloud-dns#vpc_scope_dns" rel="noopener noreferrer"&gt;VPC scope DNS&lt;/a&gt; — Provides for multi-cluster, multi-environment, and VPC-wide K8s service resolution.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffed1ha8dkh42yoj6jsgn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffed1ha8dkh42yoj6jsgn.png" alt="Image description" width="800" height="477"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Replicated managed DNS for multi-cluster service discovery.&lt;/p&gt;

&lt;p&gt;Cloud DNS abstracts away a lot of the traditional overhead that you would have. The cloud provider is going to manage the scaling, the monitoring and security patches, and all the other aspects you would expect from a managed offering. There are also some added benefits to some of the cloud providers with GKE providing a node local DNS cache, which reduces latency by running a DNS cache at a lower level so that you’re not waiting on DNS response.&lt;/p&gt;

&lt;p&gt;For the long term, a managed service specifically for DNS will work fine if you’re only in a single cloud. But, if you’re spanning clusters across multiple cloud providers and your on-prem environment, managed offerings may only be part of the solution.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://www.cncf.io/" rel="noopener noreferrer"&gt;Cloud Native Computing Foundation&lt;/a&gt; (CNCF) provides a multitude of options, and there are tons of open source projects that really have come a long way in helping to alleviate some of these pain points, especially in that cross-cloud, multi-cloud, or hybrid-cloud type of scenario.&lt;/p&gt;

&lt;p&gt;Curious to learn more about (or play with) Cassandra itself? We recommend trying it on the &lt;a href="https://astra.dev/3iLJEdl" rel="noopener noreferrer"&gt;Astra DB&lt;/a&gt; free plan for the fastest setup.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow the &lt;a href="https://dtsx.io/3B9gc8E" rel="noopener noreferrer"&gt;DataStax Tech Blog&lt;/a&gt; for more developer stories. Check out our &lt;a href="https://dtsx.io/3a1Kz4W" rel="noopener noreferrer"&gt;YouTube&lt;/a&gt; channel for tutorials and here for DataStax Developers on &lt;a href="https://dtsx.io/2Ym54qA" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt; for the latest news about our developer community.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
