<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sunit Parekh</title>
    <description>The latest articles on DEV Community by Sunit Parekh (@sunitparekh).</description>
    <link>https://dev.to/sunitparekh</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F261244%2F25321e35-0fb8-403b-85fe-d9343a9ffd68.jpg</url>
      <title>DEV Community: Sunit Parekh</title>
      <link>https://dev.to/sunitparekh</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sunitparekh"/>
    <language>en</language>
    <item>
      <title>Step by step guide to chaos testing using Litmus Chaos toolkit</title>
      <dc:creator>Sunit Parekh</dc:creator>
      <pubDate>Thu, 04 Nov 2021 13:04:19 +0000</pubDate>
      <link>https://dev.to/sunitparekh/step-by-step-guide-to-chaos-testing-using-litmus-chaos-toolkit-77l</link>
      <guid>https://dev.to/sunitparekh/step-by-step-guide-to-chaos-testing-using-litmus-chaos-toolkit-77l</guid>
      <description>&lt;p&gt;by &lt;a href="https://www.linkedin.com/in/sunitparekh/" rel="noopener noreferrer"&gt;Sunit Parekh&lt;/a&gt; &amp;amp; &lt;a href="https://www.linkedin.com/in/prashanth92/" rel="noopener noreferrer"&gt;Prashanth Ramakrishnan&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this article we will describe how to perform chaos testing using Litmus (a popular chaos testing tool). &lt;/p&gt;

&lt;p&gt;There are 4 major steps for running any chaos test.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The first step is defining a steady state, which means defining how an ideal system would look like. For a web application, the home page is returning a success response, for a web service this would mean that it is healthy or it is returning a success for the health endpoint.&lt;/li&gt;
&lt;li&gt;The second step is actually introducing chaos such as simulating a failure such as a network bottleneck / disk fill etc.&lt;/li&gt;
&lt;li&gt;The third step is to verify a steady state, i.e, to check if the system is still working as expected.&lt;/li&gt;
&lt;li&gt;The fourth step which is the most important step (more important if you are running in production) is that we roll back the chaos that we caused.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F61d59mv4f9n3m2d6j497.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F61d59mv4f9n3m2d6j497.png" alt="chaos testing as 4 steps"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 0: Kubernetes Cluster with Application running &amp;amp; Monitoring in place
&lt;/h2&gt;

&lt;p&gt;To learn more about chaos testing, first we need to have an application under test, for this demo, we are going to have a BookInfo application deployed on a single node Kubernetes cluster. Along with it, we have Prometheus, Grafana, Jaeger &amp;amp; Kiali setup, along with Istio service mesh. &lt;/p&gt;

&lt;p&gt;0.1) &lt;strong&gt;Setup Kubernetes Cluster:&lt;/strong&gt; Get your Kubernetes cluster up and running with Docker1 as container runtime. To keep it simple install Docker for Desktop and also start Kubernetes along with it. However, you can also use Minikube, k3d, kind to set up local k8s clusters. &lt;/p&gt;

&lt;p&gt;0.2) &lt;strong&gt;Setup Monitoring:&lt;/strong&gt; Next to setup Istio along with all monitoring tools such as Prometheus, Grafana, Jaeger &amp;amp; Kiali&lt;/p&gt;

&lt;p&gt;Install Istio:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;istioctl &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="nv"&gt;profile&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;demo &lt;span class="nt"&gt;-y&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Install monitoring:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# set ISTIO_RELEASE_URL to specific istio release version &lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ISTIO_RELEASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://raw.githubusercontent.com/istio/istio/release-1.11/

kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; &lt;span class="nv"&gt;$ISTIO_RELEASE_URL&lt;/span&gt;/samples/addons/jaeger.yaml
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; ISTIO_RELEASE_URL/samples/addons/prometheus.yaml
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; ISTIO_RELEASE_URL/samples/addons/grafana.yaml
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; &lt;span class="nv"&gt;$ISTIO_RELEASE_URL&lt;/span&gt;/samples/addons/kiali.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feps7p65rlg079xbcndaa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feps7p65rlg079xbcndaa.png" alt="Istio and monitoring pods installed in istio-system namespace of k8s custer"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;0.3) &lt;strong&gt;Install Bookinfo&lt;/strong&gt; application with Istio service mesh enabled and envoy sidecar installed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1rx91wihj5hj1n13szq4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1rx91wihj5hj1n13szq4.png" alt="Bookinfo application overview with 4 microservices and sidecar proxy injected"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Download BookInfo yaml from Istio website: &lt;a href="https://raw.githubusercontent.com/istio/istio/release-1.11/samples/bookinfo/platform/kube/bookinfo.yaml" rel="noopener noreferrer"&gt;https://raw.githubusercontent.com/istio/istio/release-1.11/samples/bookinfo/platform/kube/bookinfo.yaml&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Install Bookinfo with envoy proxy injected as sidecar container into default namespace:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;istioctl kube-inject &lt;span class="nt"&gt;-f&lt;/span&gt; book-info.yaml | kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; -
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr0jezfz1v72r3cdya2r3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr0jezfz1v72r3cdya2r3.png" alt="Bookinfo application pods running in default namespace"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;0.4) &lt;strong&gt;Verify applications&lt;/strong&gt; is running and deployed with envoy proxy as sidecar.&lt;/p&gt;

&lt;p&gt;Do port forwarding for the &lt;code&gt;productpage&lt;/code&gt; service and check in your browser.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl port-forward service/productpage 9080:9080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you open &lt;code&gt;localhost:9080&lt;/code&gt; in your web browser, you should see something like this&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk0pm4s6rlu0l91ijvb81.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk0pm4s6rlu0l91ijvb81.png" alt="Product page of Bookinfo application"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So now I have my application up and running inside the k8s cluster.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1: Define steady state
&lt;/h2&gt;

&lt;p&gt;Steady state for the bookinfo application is that the product page should keep rendering without any issues. Means &lt;code&gt;http://localhost:9080/productpage?u=normal&lt;/code&gt; should return &lt;code&gt;200&lt;/code&gt; http status code under continuous load.&lt;/p&gt;

&lt;p&gt;To check my steady state condition let me first generate continuous load on my bookinfo application using a command line tool called hey and monitor it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hey &lt;span class="nt"&gt;-c&lt;/span&gt; 2 &lt;span class="nt"&gt;-z&lt;/span&gt; 200s http://localhost:9080/productpage 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Above command generates continuous load on the product page for 200 seconds with 2 concurrent workers. &lt;/p&gt;

&lt;p&gt;Here is a quick view of the Kiali dashboard showing all pods healthy and review service responding in a 100-200 ms timeframe, which internally calling rating service responding in avg 50-60 ms.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1e8l1pq0a43qkfxnpvzc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1e8l1pq0a43qkfxnpvzc.png" alt="Bookinfo application — Kiali Dashboard"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 2: Introduce chaos
&lt;/h2&gt;

&lt;p&gt;All set, now time to introduce chaos in the system. Let's first understand Litmus core concepts before we jump into execution.&lt;/p&gt;

&lt;p&gt;2.1) &lt;strong&gt;Install Litmus:&lt;/strong&gt; First step is to install an operator (Litmus Operator) into the Kubernetes cluster where we like to introduce chaos. Limus operator adds 3 custom resource definitions related to Litmus chaos into k8s cluster. You can also use Helm charts to install Litmus operators and its web UI. However, for simplicity I am going with direct yaml and install only operator.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$kubectl&lt;/span&gt; apply &lt;span class="nt"&gt;-f&lt;/span&gt; https://litmuschaos.github.io/litmus/litmus-operator-v2.2.0.yaml 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcnak4yyerladlu9ztpzw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcnak4yyerladlu9ztpzw.png" alt="List of all CRDs added as part of installing Litmus operator"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2.2) &lt;strong&gt;Setup Experiment:&lt;/strong&gt;  After that we need to add a specific experiment in the namespace where we like to introduce chaos. List of all the available chaos are listed here. Lets add a network deploy chaos experiment into the default namespace where we have our bookinfo application installed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$kubectl&lt;/span&gt; apply &lt;span class="nt"&gt;-f&lt;/span&gt; https://hub.litmuschaos.io/api/chaos/2.2.0?file&lt;span class="o"&gt;=&lt;/span&gt;charts/generic/pod-network-latency/experiment.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2p2bhbmrb8t4rk0ju9s4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2p2bhbmrb8t4rk0ju9s4.png" alt="pod-network-latency experiment added in default namespace"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2.3) &lt;strong&gt;Apply Permissions:&lt;/strong&gt; Now we need to give permission using RBAC to allow chaos experiments to run.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$kubectl&lt;/span&gt; apply &lt;span class="nt"&gt;-f&lt;/span&gt; https://hub.litmuschaos.io/api/chaos/2.2.0?file&lt;span class="o"&gt;=&lt;/span&gt;charts/generic/pod-network-latency/rbac.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk1m3u0v9e60ecy08foj6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk1m3u0v9e60ecy08foj6.png" alt="pod-network-latency-sa added in default namespace"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2.4) &lt;strong&gt;Run Chaos:&lt;/strong&gt; Using ChaosEngine custom resource definition, we inject network delay chaos. Please look at the following yaml &lt;code&gt;network-delay-engine.yaml&lt;/code&gt; of kind &lt;strong&gt;ChaosEngine&lt;/strong&gt; for introducing network delay of 2 sec for ratings deployment for about 100 seconds affecting all pods under deployment. Delay in ratings service response is going to indirectly delay review services and which indirectly adds delay to product page.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;network-delay-engine.yaml&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;litmuschaos.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ChaosEngine&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bookinfo-network-delay&lt;/span&gt;
 &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  
 &lt;span class="na"&gt;jobCleanUpPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;retain'&lt;/span&gt;  &lt;span class="c1"&gt;# It can be delete/retain&lt;/span&gt;
 &lt;span class="na"&gt;annotationCheck&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;false'&lt;/span&gt;
 &lt;span class="na"&gt;engineState&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;active'&lt;/span&gt;
 &lt;span class="na"&gt;monitoring&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
 &lt;span class="na"&gt;appinfo&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
   &lt;span class="na"&gt;appns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;default'&lt;/span&gt;
   &lt;span class="na"&gt;applabel&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;app=ratings'&lt;/span&gt;   &lt;span class="c1"&gt;# application label matching&lt;/span&gt;
   &lt;span class="na"&gt;appkind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;deployment'&lt;/span&gt;     &lt;span class="c1"&gt;# k8s object type&lt;/span&gt;
 &lt;span class="na"&gt;chaosServiceAccount&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pod-network-latency-sa&lt;/span&gt;
 &lt;span class="na"&gt;experiments&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
   &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pod-network-latency&lt;/span&gt;
     &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;components&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
           &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NETWORK_INTERFACE&lt;/span&gt;
             &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;eth0'&lt;/span&gt;   &lt;span class="c1"&gt;# default interface used by pod   &lt;/span&gt;
           &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NETWORK_LATENCY&lt;/span&gt;
             &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;2000'&lt;/span&gt;   &lt;span class="c1"&gt;# delay in milliseconds&lt;/span&gt;
           &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TOTAL_CHAOS_DURATION&lt;/span&gt;
             &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;100'&lt;/span&gt;    &lt;span class="c1"&gt;# chaos duration in seconds&lt;/span&gt;
           &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PODS_AFFECTED_PERC&lt;/span&gt;
             &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;100'&lt;/span&gt;    &lt;span class="c1"&gt;# effect # of pods in percentage&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Please check comments in above yaml to learn more about different configurations. Details about each configuration can be found in documentation provided by Litmus toolkit here.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$kubectl&lt;/span&gt; apply &lt;span class="nt"&gt;-f&lt;/span&gt; network-delay-engine.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is pod watch of default namespace and notice &lt;code&gt;bookinfo-network-delay-runner&lt;/code&gt;,  &lt;code&gt;pod-network-latency-rp2aly-vg4xt&lt;/code&gt; and &lt;code&gt;pod-network-latency-helper-hhpofr&lt;/code&gt; pods doing the job of introducing network delay for rating service.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpv6pbnjfkzj1vq7dnf57.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpv6pbnjfkzj1vq7dnf57.png" alt="pods status during chaos testing from start to end"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2.5) &lt;strong&gt;Observe Result:&lt;/strong&gt; Use Kubernetes describe command to see output of the chaos run we had in previous steps. Lets first notice increased time in review service response time on Kiali.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsaix6zypbi07ijtk0ziu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsaix6zypbi07ijtk0ziu.png" alt="Kiali dashboard showing 2s+ response time for Reviews service"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now let's &lt;code&gt;describe&lt;/code&gt; &lt;code&gt;ChaosEngine&lt;/code&gt; and &lt;code&gt;ChaosResult&lt;/code&gt; to see the result in Litmus custom objects description.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2vqvqxiw22boeaga1yfl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2vqvqxiw22boeaga1yfl.png" alt="Chaos custom resource lookups"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Observe events using describe on &lt;code&gt;chaosengine&lt;/code&gt; custom resource &lt;code&gt;bookinfo-network-delay&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fej378j6008juxdzsbl27.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fej378j6008juxdzsbl27.png" alt="Events stream using describe from chaosengine custom resource bookinfo-network-delay"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Observe events using describe on &lt;code&gt;chaosresult&lt;/code&gt; custom resource &lt;code&gt;bookinfo-network-delay-pod-network-latency&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4f1tobij0sdlo3kybpe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4f1tobij0sdlo3kybpe.png" alt="Events stream using describe from chaosresult custom resource bookinfo-network-delay-pod-network-latency"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Repeat chaos testing&lt;/strong&gt; by increasing delay to 6 seconds (6000 ms) and repeating steps 2.4 and 2.5. Change  &lt;code&gt;network-delay-engine.yaml&lt;/code&gt; with config &lt;code&gt;NETWORK_LATENCY: 6000&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8zbtw1zivn5d4jlijdos.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8zbtw1zivn5d4jlijdos.png" alt="Reviews services turning red when 6s delay introduced by ratings service"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq14rdb4wbxgwsc9rz3ld.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq14rdb4wbxgwsc9rz3ld.png" alt="Product page loading with error handled gracefully now showing ratings information"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 3: Verify steady state
&lt;/h2&gt;

&lt;p&gt;During the chaos test time we continuously accessed the system and observed 200 responses for the product page. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3hyt18uslf96afbxh8to.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3hyt18uslf96afbxh8to.png" alt="hey command output showing result of productpage response as all 200 status code"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Observed 2 sec delay in response time on review service on Kiali dashboard.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fprfg8c3n2axc9279c6hd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fprfg8c3n2axc9279c6hd.png" alt="Kiali dashboard with 2s delay on Reviews service"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 4: Rollback chaos
&lt;/h2&gt;

&lt;p&gt;In our case of network delay, since the chaos duration was set to 100 sec. It stopped automatically after 100 sec. So nothing to be done. Just observe that our system is back to normal. &lt;/p&gt;

&lt;p&gt;On Kiali dashboard we see returning it to normal with review response time less than 100 ms timeframe and rating response time in 50-60 ms timeframe.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flq4azweg7wyvrcgxzrql.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flq4azweg7wyvrcgxzrql.png" alt="Kaili dashboard with Review services responses in double digit ms time (all back to normal)"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Q&amp;amp;A
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Can I use the Litmus tool with any other container runtime like contrainerd?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Yes, Steps in this article are keeping Docker as container runtime, however, if you have other runtimes like containerd, please read configuration on Chaos website for different configurations needed to run chaos experiments.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Where can I find a list of all chaos experiments available?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Litmus has some predefined chaos experiments available which can be found here, but it does not limit us to define our own experiments and run them in our own environments.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How to debug issues if any?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;While running chaos using &lt;code&gt;ChoseEngine&lt;/code&gt; CRD, use following flag &lt;code&gt;jobCleanUpPolicy: 'retain'&lt;/code&gt; to keep pods in complete state (and not to be deleted after chaos run) which provides ability to look at logs of the pods.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Above commands and code checked into the public repository on Github.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://github.com/sunitparekh/chaos-engg-litmus" rel="noopener noreferrer"&gt;https://github.com/sunitparekh/chaos-engg-litmus&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Watch all of above in action in XConf 2021 online conference talk&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://www.youtube.com/watch?v=6Lz_0uNaVMA&amp;amp;list=PL8f-F_Zx8XA-kMENPeMMXT9KKo-x4F_NO&amp;amp;index=3" rel="noopener noreferrer"&gt;https://www.youtube.com/watch?v=6Lz_0uNaVMA&amp;amp;list=PL8f-F_Zx8XA-kMENPeMMXT9KKo-x4F_NO&amp;amp;index=3&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Learn more with hands-on tutorial on Litmus site&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://docs.litmuschaos.io/tutorials/" rel="noopener noreferrer"&gt;https://docs.litmuschaos.io/tutorials/&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Online hands-on with Litmus tool on KataCode&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://katacoda.com/litmusbot/scenarios/getting-started-with-litmus" rel="noopener noreferrer"&gt;https://katacoda.com/litmusbot/scenarios/getting-started-with-litmus&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>chaostesting</category>
      <category>chaosengineering</category>
      <category>litmuschaos</category>
    </item>
  </channel>
</rss>
