<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jesper Axelsen</title>
    <description>The latest articles on DEV Community by Jesper Axelsen (@jaxels10).</description>
    <link>https://dev.to/jaxels10</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F585473%2F16287ee2-45e0-4f2e-a919-d5eaee113cf4.jpg</url>
      <title>DEV Community: Jesper Axelsen</title>
      <link>https://dev.to/jaxels10</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jaxels10"/>
    <language>en</language>
    <item>
      <title>The simple, yet powerful testing framework hidden in FluxCD</title>
      <dc:creator>Jesper Axelsen</dc:creator>
      <pubDate>Tue, 30 Sep 2025 17:45:18 +0000</pubDate>
      <link>https://dev.to/jaxels10/the-secret-powerful-and-simple-testing-framework-hidden-in-fluxcd-35df</link>
      <guid>https://dev.to/jaxels10/the-secret-powerful-and-simple-testing-framework-hidden-in-fluxcd-35df</guid>
      <description>&lt;h1&gt;
  
  
  Testing Deployments in Kubernetes with FluxCD and Helm
&lt;/h1&gt;

&lt;p&gt;Throughout my career, I’ve helped set up numerous production-grade Kubernetes clusters. While there are many problem domains in that area, today I’ll be focusing on &lt;strong&gt;testing&lt;/strong&gt;.  &lt;/p&gt;




&lt;h2&gt;
  
  
  The Challenge of Testing in Kubernetes
&lt;/h2&gt;

&lt;p&gt;Testing in Kubernetes has never been easy. Questions such as &lt;em&gt;“What kind of testing would you like to do?”&lt;/em&gt; or &lt;em&gt;“When should the tests run?”&lt;/em&gt; are difficult to answer — not just in Kubernetes, but in DevOps as a whole.  &lt;/p&gt;

&lt;p&gt;When using &lt;strong&gt;GitOps&lt;/strong&gt; as a deployment model on Kubernetes, one of its core benefits is &lt;strong&gt;fast deployment&lt;/strong&gt;. GitOps promises that &lt;strong&gt;committed code&lt;/strong&gt; enters a reconciliation loop, eventually resulting in scheduled workloads on, for example, a Kubernetes cluster.  &lt;/p&gt;




&lt;h2&gt;
  
  
  Why Testing Matters in GitOps Workflows
&lt;/h2&gt;

&lt;p&gt;Since code is typically committed many times a day, any testing framework that runs alongside a deployment should be &lt;strong&gt;easy to use&lt;/strong&gt;, &lt;strong&gt;flexible&lt;/strong&gt;, and ideally &lt;strong&gt;integrated&lt;/strong&gt; into your deployment system.  &lt;/p&gt;

&lt;p&gt;This is where &lt;strong&gt;FluxCD&lt;/strong&gt; comes in.  &lt;/p&gt;




&lt;h2&gt;
  
  
  Introducing FluxCD
&lt;/h2&gt;

&lt;p&gt;FluxCD is a delivery solution for Kubernetes that implements a &lt;strong&gt;GitOps operator&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
I won’t go into detail about GitOps operators or FluxCD in general — you can explore those topics in the &lt;a href="https://fluxcd.io/" rel="noopener noreferrer"&gt;official documentation&lt;/a&gt;.  &lt;/p&gt;

&lt;p&gt;Instead, I want to highlight a powerful feature from their &lt;a href="https://fluxcd.io/flux/components/helm/helmreleases/" rel="noopener noreferrer"&gt;&lt;code&gt;HelmRelease&lt;/code&gt; CRD&lt;/a&gt;.  &lt;/p&gt;




&lt;h2&gt;
  
  
  Leveraging Helm Tests with FluxCD
&lt;/h2&gt;

&lt;p&gt;The idea is simple: when deploying to Kubernetes, we want to run a series of tests to ensure that our deployment is successful.  &lt;/p&gt;

&lt;p&gt;Helm provides an excellent &lt;a href="https://helm.sh/docs/topics/chart_tests/" rel="noopener noreferrer"&gt;&lt;strong&gt;testing framework&lt;/strong&gt;&lt;/a&gt;, and FluxCD integrates directly with it. In the &lt;a href="https://fluxcd.io/flux/components/helm/helmreleases/#test-configuration" rel="noopener noreferrer"&gt;test configuration&lt;/a&gt; section of the documentation, FluxCD specifically enables us to run Helm Tests as part of both &lt;strong&gt;install&lt;/strong&gt; and &lt;strong&gt;upgrade&lt;/strong&gt; processes for any &lt;code&gt;HelmRelease&lt;/code&gt;.  &lt;/p&gt;

&lt;p&gt;This means we can write tests in Helm and have them automatically executed during an install or upgrade. The deployment will only be reported as successful to the FluxCD controller once all tests pass.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuy0x9pk3kh5fwq6u0cgy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuy0x9pk3kh5fwq6u0cgy.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Automatic Rollbacks and Safer Deployments
&lt;/h2&gt;

&lt;p&gt;This is a powerful capability — one that I haven’t seen utilized often.&lt;br&gt;&lt;br&gt;
Essentially, we can instruct our deployment controller to &lt;strong&gt;roll back&lt;/strong&gt; a deployment automatically if any tests fail.  &lt;/p&gt;

&lt;p&gt;We can also run tests during installation to verify that specific resources are created or configured correctly, ensuring that everything behaves as expected before marking the deployment as complete.&lt;/p&gt;




&lt;h2&gt;
  
  
  Confidence and Control in Your Deployments
&lt;/h2&gt;

&lt;p&gt;Running Helm tests during every install and upgrade in FluxCD gives us a &lt;strong&gt;high level of confidence&lt;/strong&gt; that our deployment is successful.  &lt;/p&gt;

&lt;p&gt;It also allows us to define &lt;strong&gt;specific tests&lt;/strong&gt; that serve as &lt;strong&gt;rollback triggers&lt;/strong&gt; when they fail. With this approach, you gain the confidence and control that many large organizations require for production deployments.  &lt;/p&gt;

&lt;p&gt;In the environments where I’ve implemented this setup, install and upgrade errors have been reduced to &lt;strong&gt;almost zero&lt;/strong&gt;. And when an error does occur, it’s straightforward to pinpoint the issue — since a failing test will clearly indicate what went wrong.  &lt;/p&gt;

&lt;p&gt;If no test fails but the deployment had an error, remember to &lt;strong&gt;add a test&lt;/strong&gt; for that scenario in the future — continuous improvement in test coverage is key to long-term reliability.  &lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>coding</category>
      <category>testing</category>
    </item>
    <item>
      <title>Ceph data durability, redundancy, and how to use Ceph</title>
      <dc:creator>Jesper Axelsen</dc:creator>
      <pubDate>Fri, 12 Mar 2021 11:35:55 +0000</pubDate>
      <link>https://dev.to/itminds/ceph-data-durability-redundancy-and-how-to-use-ceph-2ml0</link>
      <guid>https://dev.to/itminds/ceph-data-durability-redundancy-and-how-to-use-ceph-2ml0</guid>
      <description>&lt;p&gt;&lt;em&gt;This blog post is the second in a series concerning Ceph.&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Creating data redundancy
&lt;/h1&gt;

&lt;p&gt;One of the main concerns when dealing with large sets of data is data durability. We do not want a cluster in which a simple disk failure will introduce a loss in data. What Ceph aims for instead is fast recovery from any type of failure occurring on a specific failure domain. &lt;/p&gt;

&lt;p&gt;Ceph is able to ensure data durability by using either replication or erasure coding. &lt;/p&gt;

&lt;h2&gt;
  
  
  Replication
&lt;/h2&gt;

&lt;p&gt;For those of you who are familiar with RAID, you can think of Ceph's replication as RAID 1 but with subtle differences. &lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbps2b6bzwdye6fdtr597.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbps2b6bzwdye6fdtr597.png" alt="Alt Text" width="150" height="231"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The data is replicated onto a number of different OSDs, nodes, or racks depending on your cluster configuration. The original data and the replicas are split into many small chunks and evenly distributed across your cluster using the CRUSH-algorithm. If you have chosen to have three replicas on a 6-node cluster, these three replicas will be spread out onto all six nodes, not just three nodes containing the full replicas. &lt;/p&gt;

&lt;p&gt;It is important to choose the right level of data replication. If you are running a single-node cluster, replication on the node level would be impossible and your cluster would lose data in the event of a single OSD failure. In this case, you would choose to replicate data across the OSDs you have available on the node. &lt;/p&gt;

&lt;p&gt;On a multi-node cluster, your replication factor decides how many OSDs or nodes you can afford to lose in case of disk or node failure, without data loss. Of course, the replication of data introduces the problem of lowering your total amount of space available in your cluster. If you choose a replication factor of 3 on the node level, you will only have 1/3 of your total storage available in your cluster for you to use. &lt;/p&gt;

&lt;p&gt;Replication in Ceph is fast and only limited by the read/write operations of the OSDs. However, some people are not content with "only" being able to use a small amount of their total space. Therefore, Ceph also introduced erasure coding. &lt;/p&gt;
&lt;h2&gt;
  
  
  Erasure Coding
&lt;/h2&gt;

&lt;p&gt;Erasure coding encodes your original data in a way so that when you need to retrieve the data again, you only need a subset of the data to recreate the original information. It splits objects into &lt;em&gt;k&lt;/em&gt; data fragments and then computes &lt;em&gt;m&lt;/em&gt; parity fragments. I will provide an example. &lt;/p&gt;

&lt;p&gt;Let us say that the value of our data is 52. We could split it into: &lt;br&gt;
&lt;code&gt;x = 5&lt;/code&gt;&lt;br&gt;
&lt;code&gt;y = 2&lt;/code&gt; &lt;/p&gt;

&lt;p&gt;The encoding process will then compute a number of parity fragments. In this example, these will be equations: &lt;br&gt;
&lt;code&gt;x + y = 7&lt;/code&gt;&lt;br&gt;
&lt;code&gt;x - y = 3&lt;/code&gt;&lt;br&gt;
&lt;code&gt;2x + y = 12&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Here, we have a &lt;em&gt;k = 2&lt;/em&gt; and &lt;em&gt;m = 3&lt;/em&gt;. &lt;em&gt;k&lt;/em&gt; is the number of data fragments and &lt;em&gt;m&lt;/em&gt; is the number of parity fragments. In case of a disk or node failure and the data needs to be recovered, out of the 5 elements we will be storing (the two data fragments and the three parity fragments) we only require two of these five to recover. This is what ensures data durability when using erasure coding. &lt;/p&gt;

&lt;p&gt;Now, why does this matter? It matters because these parity fragments take up significantly less space when compared to replicating the data. Here is a table that shows how much overhead there is on different erasure coding schemes. The overhead is calculated with &lt;em&gt;m / k&lt;/em&gt;. &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Erasure coding scheme &lt;em&gt;(k+m)&lt;/em&gt;
&lt;/th&gt;
&lt;th&gt;Minimum number of nodes&lt;/th&gt;
&lt;th&gt;Storage overhead&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;4+2&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;50%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6+2&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;33%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8+2&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;25%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6+3&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;50%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;As we can see in the table, you can use the &lt;em&gt;(8+2)&lt;/em&gt; scheme to make sure you can lose two of your nodes without losing any data, and this with only a 25% storage overhead. &lt;/p&gt;

&lt;p&gt;If you look at this from a storage space optimization standpoint, this is a much better use of the storage. However, it is not without certain downsides. The parity fragments take time for the cluster to calculate and read/write operations are therefore slower than with replication. Therefore, erasure coding is usually recommended on clusters that deal with large amounts of &lt;a href="https://www.komprise.com/glossary_terms/cold-data/" rel="noopener noreferrer"&gt;cold data&lt;/a&gt;. &lt;/p&gt;
&lt;h1&gt;
  
  
  Using Ceph
&lt;/h1&gt;

&lt;p&gt;A natural part of deployments on Kubernetes is to create persistent volume claims (PVCs). PVCs can claim a volume and use that as storage for data in the pod. In order to create a PVC you first need to define a StorageClass in Kubernetes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: replicapool
spec:
  failureDomain: host
  replicated:
    size: 3
    requireSafeReplicaSize: true
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: rook-ceph-block
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
    clusterID: rook-ceph # namespace:cluster
    pool: replicapool
    imageFormat: "2"
    imageFeatures: layering
    csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
    csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph # namespace:cluster
    csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
    csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph # namespace:cluster
    csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
    csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph # namespace:cluster
    csi.storage.k8s.io/fstype: ext4
allowVolumeExpansion: true
reclaimPolicy: Delete
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this StorageClass file, you can see that we first create a replica pool that creates 3 replicas in total and uses &lt;code&gt;host&lt;/code&gt; as the failure domain. After that, we define whether or not we should allow volume expansion after a volume is created and what the reclaim policy should be. Reclaim policy determines whether the data that is stored in the volume should be deleted or retained when a pod ceases to exist. In this case, I have chosen delete.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# kubectl get storageclass -n rook-ceph
NAME              PROVISIONER                     RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
rook-ceph-block   rook-ceph.rbd.csi.ceph.com      Delete          Immediate           true                   10m
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now that the StorageClass has been created, we can now create a PVC:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: rook-ceph-block
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates a PVC that is now running on our Kubernetes cluster:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# kubectl get pvc -n rook-ceph
NAME      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
rbd-pvc   Bound    pvc-56c45f01-562f-4222-8199-43abb856ca94   1Gi        RWO            rook-ceph-block   37s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We will now deploy a pod that uses this PVC:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;---
apiVersion: v1
kind: Pod
metadata:
  name: demo-pod
spec:
  containers:
   - name: web-server
     image: nginx
     volumeMounts:
       - name: mypvc
         mountPath: /var/lib/www/html
  volumes:
   - name: mypvc
     persistentVolumeClaim:
       claimName: pvc
       readOnly: false
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After deploying this pod, you can see it in the pod list:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# kubectl get pods -n rook-ceph
NAME              READY   STATUS    RESTARTS   AGE
demo-pod          1/1     Running   0          118s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is how you deploy pods that create persistent volume claims on your Ceph cluster!&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>ceph</category>
    </item>
    <item>
      <title>Deploying a Ceph cluster with Kubernetes and Rook</title>
      <dc:creator>Jesper Axelsen</dc:creator>
      <pubDate>Fri, 05 Mar 2021 12:28:49 +0000</pubDate>
      <link>https://dev.to/itminds/deploying-a-ceph-cluster-with-kubernetes-and-rook-1291</link>
      <guid>https://dev.to/itminds/deploying-a-ceph-cluster-with-kubernetes-and-rook-1291</guid>
      <description>&lt;p&gt;&lt;em&gt;This blog post is the first in a series concerning Ceph.&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;In a world that is seeing an ever-increasing data generation, the need for scalable storage solutions will naturally rise. I am going to introduce you to one of these today. It is called Ceph.&lt;/p&gt;

&lt;p&gt;Ceph is an open-source software storage platform. It implements object storage on a distributed computer cluster and provides an interface for three storage types: block, object, and file. Ceph's aim is to provide a free, distributed storage platform without any single point of failure that is highly scalable and will keep your data intact.&lt;/p&gt;

&lt;p&gt;This post will go through the Ceph architecture, how to set up your own Ceph storage cluster, and discuss the architectural decisions you will inevitably have to make. We will be deploying Ceph on a Kubernetes cluster using the cloud-native storage orchestrator Rook. &lt;/p&gt;

&lt;h1&gt;
  
  
  Architecture
&lt;/h1&gt;

&lt;p&gt;First, a small introduction to Ceph's architecture. &lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjad8334w7b7rk9vzdrdk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjad8334w7b7rk9vzdrdk.png" alt="Ceph architecture" width="693" height="490"&gt;&lt;/a&gt;&lt;br&gt;
A Ceph storage cluster can be accessed in a number of ways. &lt;/p&gt;

&lt;p&gt;First, Ceph provides the LIBRADOS library that allows you to connect directly to your storage cluster using either C, C++, Java, Python, Ruby or PHP. Ceph also allows for object storage through a REST gateway that is accessible with S3 and Swift. &lt;/p&gt;

&lt;p&gt;Using Kubernetes, the more common ways to use your storage cluster would be to either create persistent volume claims(PVCs) using .yaml files in Kubernetes or to create a POSIX-compliant distributed filesystem. &lt;/p&gt;

&lt;p&gt;Underneath all of this lies a reliable, autonomous, distributed object storage(RADOS). RADOS is in charge of managing the underlying daemons that are deployed with Ceph. &lt;/p&gt;

&lt;p&gt;A Ceph storage cluster has these types of daemons: &lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo2pj675bnhwffebwzohz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo2pj675bnhwffebwzohz.png" alt="Alt Text" width="750" height="98"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The object storage daemons(OSDs) handle read/write operations on the disks. They are also in charge of checking that the state of the disk is healthy and report back to the monitor daemons. &lt;/li&gt;
&lt;li&gt;The monitor daemons keep a copy of the cluster map as well as monitor the state of the cluster. These daemons are what ensure high availability if any monitor fails. You will always need an odd number of monitor daemons to keep quorum and it is recommended to dedicate nodes for the monitor daemons to run on, separate from the storage nodes. &lt;/li&gt;
&lt;li&gt;The manager daemons creates and manages a map of clients, as well as management of reweighting and rebalancing operations.&lt;/li&gt;
&lt;li&gt;The metadata server manages additional metadata about the file system, specifically permissions, hierarchy, names, timestamps, and owners. &lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;
  
  
  Deploying the cluster
&lt;/h1&gt;

&lt;p&gt;Having acquired a rudimentary understanding of Ceph, we are now ready to build our storage cluster. A basic guide on how to set up a Kubernetes cluster on Ubuntu can be found &lt;a href="https://computingforgeeks.com/how-to-setup-3-node-kubernetes-cluster-on-ubuntu-18-04-with-weave-net-cni/" rel="noopener noreferrer"&gt;here&lt;/a&gt;. We will be deploying Ceph on a 3-node cluster where each node will have 2 available drives for Ceph to mount. To confirm that the cluster is up and running, run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# kubectl get nodes
NAME          STATUS   ROLES                  AGE    VERSION
k8s-master    Ready    control-plane,master   110m   v1.20.4
k8s-node-01   Ready    &amp;lt;none&amp;gt;                 105m   v1.20.4
k8s-node-02   Ready    &amp;lt;none&amp;gt;                 105m   v1.20.4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Rook
&lt;/h2&gt;

&lt;p&gt;As previously stated, we will be using &lt;a href="//rook.io"&gt;Rook&lt;/a&gt; as our storage orchestrator. Clone the newest version with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/rook/rook.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After cloning the repo, navigate to the right folder with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd rook/cluster/examples/kubernetes/ceph. 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;First, we got to create the necessary custom resource definitions(CRDs) and the RoleBindings. Run the command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl create -f crds.yaml -f common.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I will not go through these two files as they are not relevant to the cluster configuration. &lt;/p&gt;

&lt;p&gt;Now, it is time for the Rook operator to be deployed. The Rook operator will automate most of the deployment of Ceph. We will in this example enable the Rook operator to automatically discover any OSDs that are empty, mount them and thereby join them into the cluster. The Rook operator is found in &lt;code&gt;operator.yaml&lt;/code&gt;. A multitude of things can be configured in the operator file. Most noteworthy is that resources can be limited, to ensure that certain parts of your cluster do not consume too many resources, thus slowing down other parts of the cluster. We will go with a standard configuration and only change the following from false to true:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- name: ROOK_ENABLE_DISCOVERY_DAEMON
  value: "false"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will enable the operator to automatically discover the current OSDs in the cluster and any OSDs that might be added later, without any input from us as admins. &lt;/p&gt;

&lt;p&gt;Now deploy the Rook operator&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# kubectl create -f operator.yaml
configmap/rook-ceph-operator-config created
deployment.apps/rook-ceph-operator created
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should now be able to see the operator pod and the OSD discover pods running in the rook-ceph namespace in Kubernetes&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# kubectl get pods -n rook-ceph
NAME                                  READY   STATUS    RESTARTS   AGE
rook-ceph-operator-678f988875-r6nc4   1/1     Running   0          83s
rook-discover-4w92b                   1/1     Running   0          41s
rook-discover-gw22p                   1/1     Running   0          41s
rook-discover-kskfx                   1/1     Running   0          41s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With the operator now running, we are ready to deploy our storage cluster. The storage cluster will be created with the &lt;code&gt;cluster.yaml&lt;/code&gt; file.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cluster configuration
&lt;/h2&gt;

&lt;p&gt;Before deploying a storage cluster, we need to configure the cluster's behavior. A storage solution needs to ensure that data is not lost in case of disk failure and that the system is able to recover quickly if anything was to happen. &lt;/p&gt;

&lt;p&gt;Changing the configurations in &lt;code&gt;cluster.yaml&lt;/code&gt; should be done with caution as you can introduce severe overhead into your cluster and even create a cluster without any data security, safety, or reliability. We will be going through the configurations I find relevant for someone deploying their first cluster.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mon: 
  count: 3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A standard cluster will have 3 monitor daemons. There have been discussions of the optimal number of monitor daemons for clusters for a long time. The general consensus is that 1 monitor pod will leave your cluster in an unhealthy state if a single node goes down. This is obviously not a great choice if you would like to ensure any kind of data durability. The other choice could be to create 5 monitor daemons. This is often regarded as a good idea when a cluster expands to hundreds or thousands of nodes. However, since each monitor keeps an updated version of the crush map, you can experience problems in the cluster's speed if this is done on a small cluster. The community largely agrees that for most clusters, this should be 3. This introduces another problem, however. If we lose more than one node at the same time, we will lose quorum and thereby leave the cluster in an unhealthy state.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;waitTimeoutForHealthyOSDInMinutes: 10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We have to configure how long we will wait for OSDs that are still in the cluster but are non-responsive. This is set in minutes. If you go too low, you will risk that a temporary unresponsive OSD will start a recovery process that might slow down your cluster unnecessarily. However, if you wait too long to check the OSDs, you run the risk of permanently losing data in the case that any other OSDs that hold the replicated data, fail. &lt;/p&gt;

&lt;p&gt;There are more things to configure in the &lt;code&gt;cluster.yaml&lt;/code&gt; file. If you would like to use the Ceph dashboard or perhaps monitor your cluster with a monitor-tool like Prometheus, you can also enable these. For now, we will leave the rest of the settings as is and deploy the cluster&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl create -f operator.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To see the magic unfold, you can use the command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;watch kubectl get pods -n rook-ceph
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a couple of minutes, Kubernetes should have deployed all the necessary daemons to have your cluster up and running. You should be able to see the monitor daemons.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;rook-ceph-mon-a-5588866567-vjg99                        1/1     Running     0          4m51s
rook-ceph-mon-b-9bc647c5b-fmbjf                         1/1     Running     0          4m27s
rook-ceph-mon-c-7cd784c4b7-qwwwb                        1/1     Running     0          4m1s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should also be able to see the OSDs in the cluster. There should be six of them since we have two disks on each of our nodes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;rook-ceph-osd-0-7b884cfccb-qpqbd                        1/1     Running     0          4m49s
rook-ceph-osd-1-5d4c587cdb-bzstp                        1/1     Running     0          4m48s
rook-ceph-osd-2-857b8786bd-q8wqk                        1/1     Running     0          4m41s
rook-ceph-osd-3-443df7d8er-q9we3                        1/1     Running     0          4m41s
rook-ceph-osd-4-5d47f54f7d-tq6rd                        1/1     Running     0          4m41s
rook-ceph-osd-5-32jkjdkwk2-33jkk                        1/1     Running     0          4m41s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That was it! You have now created your very own Ceph storage cluster, in which you will be able to create a distributed filesystem and Kubernetes will be able to create PVCs.&lt;/p&gt;

&lt;p&gt;This blog post will be continued next week with more on how Ceph ensures data durability and how to start using your Ceph cluster with Kubernetes. &lt;br&gt;
&lt;em&gt;To be continued...&lt;/em&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>ceph</category>
      <category>rook</category>
      <category>storage</category>
    </item>
  </channel>
</rss>
