<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sudhanshu Prajapati</title>
    <description>The latest articles on DEV Community by Sudhanshu Prajapati (@sudhanshu456).</description>
    <link>https://dev.to/sudhanshu456</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F946372%2Feef7d2f1-7c26-4c23-978e-a9bb60ecb671.jpeg</url>
      <title>DEV Community: Sudhanshu Prajapati</title>
      <link>https://dev.to/sudhanshu456</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sudhanshu456"/>
    <language>en</language>
    <item>
      <title>How to Implement Feature Flags Using LaunchDarkly</title>
      <dc:creator>Sudhanshu Prajapati</dc:creator>
      <pubDate>Mon, 11 Nov 2024 05:25:17 +0000</pubDate>
      <link>https://dev.to/infracloud/how-to-implement-feature-flags-using-launchdarkly-53km</link>
      <guid>https://dev.to/infracloud/how-to-implement-feature-flags-using-launchdarkly-53km</guid>
      <description>&lt;p&gt;Feature flags (often called &lt;a href="https://martinfowler.com/articles/feature-toggles.html" rel="noopener noreferrer"&gt;feature toggles&lt;/a&gt;) have existed for a long time in the software development process. We have been using feature flags in some way or another without even knowing it. So, let’s first understand what exactly feature flags are before we deep dive.&lt;/p&gt;

&lt;p&gt;In simple words, feature flags help to control the code path and user flows. You might have used sometimes commenting a line in code to switch to a different logic (or If conditional flows). For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;greeter&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
      &lt;span class="n"&gt;greeterLanguageFrench&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;true&lt;/span&gt;  &lt;span class="c1"&gt;# Comment to print greeting in English
&lt;/span&gt;      &lt;span class="c1"&gt;# greeterLanguageFrench = false  # Un comment to print greeting in English 
&lt;/span&gt;
      &lt;span class="nf"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="n"&gt;greeterLanguageFrench&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bonjour monde!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
      &lt;span class="k"&gt;else&lt;/span&gt;
          &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello World!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I hope you get the picture of how we’re changing the code path using if/else statement and commenting/uncommenting. However, the feature flag technique doesn’t require you to implement it in this way. Instead, you can control the flags remotely and turn them on/off without even changing the code in production. And this helps us &lt;a href="https://flagsmith.com/blog/decoupling-deployment-from-release-with-feature-flags/" rel="noopener noreferrer"&gt;decouple deployment from release&lt;/a&gt;. Decoupling deployment from release helps where the team is supposed to build a feature that requires weeks of work, leading to a long-lived feature branch. This long-lived branch comes with its own complexity of merging and releasing. &lt;/p&gt;

&lt;p&gt;To avoid these problems, you could do &lt;a href="https://trunkbaseddevelopment.com/" rel="noopener noreferrer"&gt;trunk-based development&lt;/a&gt; (TBD in short)&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;TBD is a source-control branching model, where developers collaborate on code in a single branch called ‘trunk’ *, resist any pressure to create other long-lived development branches by employing documented techniques. They, therefore, avoid merge hell, do not break the build, and live happily ever after.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And once the whole work is done, you can enable that feature via feature flags. This makes it considerably easier to control when to reveal new features to users while your code is already deployed from each iteration of releases. Also, feature flags help in releasing any feature to only certain users. It could be your internal team of testers/users that makes the feedback process faster and safer for you. You can just turn it off in case this feature causes latency or ambiguous behavior. &lt;/p&gt;

&lt;p&gt;Let’s look at the diagram below to understand feature flag-driven development.&lt;/p&gt;

&lt;p&gt;&lt;a href="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/feature-flag-driven-development.png" class="article-body-image-wrapper"&gt;&lt;img src="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/feature-flag-driven-development.png" alt="Feature Flag Driven Development"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feature flag allows us to ship our code to production in smaller commits and deploy in a dormant state. Now, you can decide when to turn it on/off, get feedback and iterate over it.&lt;/p&gt;

&lt;p&gt;Let's go through some scenarios which can help us understand why we need feature flags.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why do we need feature flags?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario #1 Christmas Theme
&lt;/h3&gt;

&lt;p&gt;Have you ever noticed that most online shopping sites turn their website appearance into a holiday theme around the Christmas season? Does that mean they rolled out the theme at the same time? Most certainly, it’s not. A new theme was deployed earlier but not released to users. &lt;/p&gt;

&lt;p&gt;They enable the theme during Christmas by turning on the feature flag. Further, no team wants to release a feature on Christmas. They test &amp;amp; deploy it to production much in advance and control it using feature flags.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario #2 Beta Tester
&lt;/h3&gt;

&lt;p&gt;Once your feature is deployed in production, you can make it available to only those who opt-in for the beta tester program using feature flags. This helps you get real-time feedback since your feature is running on production and make decisions on the basis of metrics on whether to roll out to all. In case a feature has a problem, you will be able to control its blast radius.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario #3 Early Access
&lt;/h3&gt;

&lt;p&gt;As the name suggests, you can choose specific users/groups, i.e, segments, to make the new feature available. Before you roll it out for everyone. This approach helps in A/B testing/experiments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario #4 Progressive Delivery
&lt;/h3&gt;

&lt;p&gt;You can roll out a feature progressively based on metrics like latency, CPU and usage, etc. If any of the metrics don’t match your requirement, you can just turn it off without affecting the user's experience. These are the few rollout strategies you can use for feature rollout – A/B testing. Read our articles on &lt;a href="https://www.infracloud.io/blogs/progressive-delivery-argo-rollouts-blue-green-deployment/" rel="noopener noreferrer"&gt;Blue-Green deployment&lt;/a&gt;, and &lt;a href="https://www.infracloud.io/blogs/progressive-delivery-argo-rollouts-canary-deployment/" rel="noopener noreferrer"&gt;Canary deployment&lt;/a&gt; to learn more about progressive delivery strategies.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario #5 Cascading Failure
&lt;/h3&gt;

&lt;p&gt;A huge team is working on multiple features, and few folks completed inter-dependent features, which got shipped in one release. If any of those features start having issues, it will lead to cascading failure. You could’ve avoided this from happening by feature flag, by turning the problematic feature off until the fix is released, and it will control the blast radius.&lt;/p&gt;

&lt;p&gt;These were some of the use cases I listed, but there could be many use cases, so It’s not limited to only mentioned ones.&lt;/p&gt;

&lt;p&gt;These are some of the benefits of having feature flags however there are pitfalls to using feature flags as well. Let’s take a look at those disadvantages.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pitfalls of feature flags
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Technical debt:&lt;/strong&gt; Introducing feature flags in code also complicates managing and keeping track of them. They need to be short-lived or have proper ownership of feature flags within the team. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Application performance:&lt;/strong&gt; Feature flags introduce latency in critical systems if not implemented with an appropriate method. It’s better to have feature flags where latency is manageable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multiple code paths:&lt;/strong&gt; When you introduce a feature flag in code, you introduce a new code path, and it’s become quite tricky to test all those code paths. There could be “n level” of nesting in the code path if you’re heavily using feature flags in the codebase.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now that we know the benefits and pitfalls of feature flags, let’s talk about implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Challenges around feature flag implementation
&lt;/h2&gt;

&lt;p&gt;Pursuant to our discussion, the implementation looks relatively easy. Still, it involves nuances; some of the challenges are listed below:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Maintenance&lt;/strong&gt; - Keep track of long-lived feature flags in your existing codebase so new flags don’t conflict with old ones. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ownership&lt;/strong&gt;- One must own the lifecycle of a flag from addition to removal; otherwise, over time, flags add up.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flag names&lt;/strong&gt; - Names should describe what they do in minimal words and should follow common naming convention throughout the codebase.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit history&lt;/strong&gt; - If someone is turning a flag “on” or “off”, make sure that you know who it is.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is crucial to track a feature flag's life cycle and remove them when they are no longer needed.&lt;/p&gt;

&lt;p&gt;In the below example, you see how we can use a conditional statement with some configuration parameters passed into the function. The below approach might help you with a short-lived feature flag where you don’t have many feature flags to deploy.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_feature_on&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
       &lt;span class="c1"&gt;# do something
&lt;/span&gt;    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
       &lt;span class="c1"&gt;# do something else
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Feature flags can be a day savior, but they can turn into a disaster like what happened on 1 Aug 2012, and it cost &lt;a href="https://www.henricodolfing.com/2019/06/project-failure-case-study-knight-capital.html" rel="noopener noreferrer"&gt;$400M to Knight Capital&lt;/a&gt; that day.&lt;/p&gt;

&lt;p&gt;These are some important factors to consider while implementing feature flags:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Short-lived vs Long-lived feature flags.&lt;/li&gt;
&lt;li&gt;Naming convention of feature flags.&lt;/li&gt;
&lt;li&gt;Ownership of feature flags.&lt;/li&gt;
&lt;li&gt;Appropriate logging.&lt;/li&gt;
&lt;li&gt;Better feature flag management, aka single pane of glass.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you like to go in-depth on best practices while implementing, you can follow &lt;a href="https://www.infoq.com/articles/feature-flags-gone-wrong/" rel="noopener noreferrer"&gt;this&lt;/a&gt; article by Edith Harbaugh. &lt;/p&gt;

&lt;p&gt;So, more importantly, we need a better feature flag management tool in place. Instead of &lt;a href="https://www.split.io/blog/top-10-challenges-when-building-a-feature-flagging-solution-from-the-ground-up/" rel="noopener noreferrer"&gt;building feature flag management&lt;/a&gt;, we can adopt any existing feature flag management platform like &lt;strong&gt;&lt;a href="https://launchdarkly.com/" rel="noopener noreferrer"&gt;LaunchDarkly&lt;/a&gt;&lt;/strong&gt;, which provides a SaaS platform to manage feature flags and help us simplify implementation through &lt;a href="https://docs.launchdarkly.com/sdk/" rel="noopener noreferrer"&gt;available SDK&lt;/a&gt;. Apart from LaunchDarkly, we do have alternative open-source tools. I’ve listed some of them below.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/Flagsmith/flagsmith" rel="noopener noreferrer"&gt;Flagsmith&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Unleash/unleash" rel="noopener noreferrer"&gt;Unleash&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/tompave/fun_with_flags" rel="noopener noreferrer"&gt;Fun with Flags&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/jnunemaker/flipper" rel="noopener noreferrer"&gt;Flipper&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/togglz/togglz" rel="noopener noreferrer"&gt;Togglz&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/featurehub-io/featurehub" rel="noopener noreferrer"&gt;FeatureHub&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now, let us discuss LaunchDarkly for the scope of this post.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is LaunchDarkly?
&lt;/h2&gt;

&lt;p&gt;It is a SaaS-based offering for the feature flag platform. On a day-to-day basis, they handle &lt;a href="https://launchdarkly.com/how-it-works/" rel="noopener noreferrer"&gt;20 Trillion feature requests&lt;/a&gt;. LaunchDarkly covers all your needs. Some of its features include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Progressive delivery&lt;/li&gt;
&lt;li&gt;A/B testing and insights&lt;/li&gt;
&lt;li&gt;Multiple ways to release a feature flag&lt;/li&gt;
&lt;li&gt;Scheduled release of feature flags&lt;/li&gt;
&lt;li&gt;Approval gate for feature flags&lt;/li&gt;
&lt;li&gt;Code references - helps you manage technical debt by finding the declaration of feature flag in the codebase.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to implement feature flags using LaunchDarkly
&lt;/h2&gt;

&lt;p&gt;We have looked into the benefits of using feature flags and management platforms. Now, we will see those features in action via a simple e-commerce application developed using Flask web framework and JavaScript.&lt;/p&gt;

&lt;p&gt;This application offers REST APIs to other businesses to list down the available product. And allow users to log in/register and save items in favorites. To run this demo application on the local system, clone &lt;a href="https://github.com/infracloudio/launchdarkly-demo" rel="noopener noreferrer"&gt;the launchdarkly-demo repository&lt;/a&gt; on your local and go through the &lt;a href="https://github.com/infracloudio/launchdarkly-demo/blob/master/README.md" rel="noopener noreferrer"&gt;readme&lt;/a&gt; for your local setup.&lt;/p&gt;

&lt;p&gt;So, without further ado, let’s begin.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to implement LaunchDarkly?
&lt;/h3&gt;

&lt;p&gt;To begin with, you need a LaunchDarkly account for this demo, and you can create a trial account &lt;a href="https://launchdarkly.com/start-trial/" rel="noopener noreferrer"&gt;here&lt;/a&gt;. Once you log in, you will see the &lt;em&gt;Feature Flag&lt;/em&gt; list on the left side of the panel. &lt;/p&gt;

&lt;p&gt;LaunchDarkly will create a project for you with the name of your account. Which is visible on top of the Production label. It will create two environments for you.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Production&lt;/li&gt;
&lt;li&gt;Test&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://docs.launchdarkly.com/home/organize/environments" rel="noopener noreferrer"&gt;Environments help you segregate rollout rules based on the environment&lt;/a&gt;. Each environment has its own SDK key, which allows the client-side applications to get all flag-associated data specific to that environment. &lt;/p&gt;

&lt;p&gt;For this demo, you need an &lt;strong&gt;&lt;a href="https://docs.launchdarkly.com/sdk/concepts/client-side-server-side#keys" rel="noopener noreferrer"&gt;SDK key&lt;/a&gt;&lt;/strong&gt; and a &lt;strong&gt;&lt;a href="https://docs.launchdarkly.com/sdk/concepts/client-side-server-side#client-side-id" rel="noopener noreferrer"&gt;Client ID&lt;/a&gt;.&lt;/strong&gt;  Both of these are available under &lt;em&gt;Account Settings&lt;/em&gt; &amp;gt; &lt;em&gt;Projects&lt;/em&gt;. You need to click on the project's name to see the available environment and associated keys. Copy keys of the &lt;em&gt;Test&lt;/em&gt; environment for this demo.&lt;/p&gt;

&lt;p&gt;&lt;a href="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/projects-keys.png" class="article-body-image-wrapper"&gt;&lt;img src="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/projects-keys.png" alt="Project Keys"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We will use those keys to run our demo application locally. You can find the instructions on “How to run locally” in the DEMO application &lt;a href="https://github.com/infracloudio/launchdarkly-demo" rel="noopener noreferrer"&gt;readme&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/Environment-Projects.png" class="article-body-image-wrapper"&gt;&lt;img src="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/Environment-Projects.png" alt="Environments and Projects"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We will need these keys to interact with &lt;a href="https://dev.tolaunchdarkly-server-sdk"&gt;launchdarkly-server-sdk &lt;/a&gt;for Python and  &lt;a href="https://www.npmjs.com/package/launchdarkly-js-client-sdk" rel="noopener noreferrer"&gt;LaunchDarkly SDK for Browser JavaScript&lt;/a&gt;. SDK should be implemented in a singleton pattern rather than creating multiple instances. So we need one instance of SDK throughout our Flask application. Let’s look at the basic implementation I followed.&lt;/p&gt;

&lt;p&gt;I created an instance of the Flask application and assigned the client object instance in this &lt;a href="https://github.com/infracloudio/launchdarkly-demo/blob/5903d74d4e918c901ea91bfa9d591cd31b3508c7/app/run.py#L34" rel="noopener noreferrer"&gt;line&lt;/a&gt;. Because of this, I can access the LaunchDarkly client through my application.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;setup_ld_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ldclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LDClient&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;featureStore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;InMemoryFeatureStore&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;LD_SDK_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LD_SDK_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;LD_FRONTEND_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LD_FRONTEND_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;ld_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LdConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;sdk_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;LD_SDK_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;HTTPConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;connect_timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;read_timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;feature_store&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;featureStore&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;inline_users_in_events&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ldclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LDClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ld_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/flask-application.png" class="article-body-image-wrapper"&gt;&lt;img src="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/flask-application.png" alt="Flask Application"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Flask Application&lt;/p&gt;

&lt;h3&gt;
  
  
  Use Case #1 Progressive Release of Dark Theme
&lt;/h3&gt;

&lt;p&gt;Context: Frontend team is building a dark theme as requested by a lot of users in feedback. So the team decided to roll out a feature first in a location where it has been most requested. &lt;/p&gt;

&lt;p&gt;Fortunately, you can do progressive releases in LaunchDarkly using workflows. However, this feature comes in the enterprise plan. But you can get a sense of how it works. Read about &lt;a href="https://docs.launchdarkly.com/home/feature-workflows/workflows?q=worklows" rel="noopener noreferrer"&gt;feature workflows&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/progressive-rollout.png" class="article-body-image-wrapper"&gt;&lt;img src="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/progressive-rollout.png" alt="Progressive Rollout"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A workflow that progressively rolls out a flag over time.&lt;/p&gt;

&lt;p&gt;For now, we will go through how LaunchDarkly helps in the JavaScript client side to get feature flag variation and change the appearance of the website.&lt;/p&gt;

&lt;p&gt;To add that feature flag to the LaunchDarkly account, go to &lt;em&gt;Feature Flags&lt;/em&gt; on the left side panel. Click &lt;em&gt;Create Flag&lt;/em&gt; and fill these values in.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Name - Dark Theme Button&lt;/li&gt;
&lt;li&gt;Key - dark-theme-button&lt;/li&gt;
&lt;li&gt;Flag Variation Type - Boolean&lt;/li&gt;
&lt;li&gt;Variation 1 - True&lt;/li&gt;
&lt;li&gt;Variation 2 - False&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/create-feature-flag.png" class="article-body-image-wrapper"&gt;&lt;img src="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/create-feature-flag.png" alt="Create Feature Flag"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/create-feature-flag-dark-theme.png" class="article-body-image-wrapper"&gt;&lt;img src="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/create-feature-flag-dark-theme.png" alt="Dark Theme Button Feature Flag"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: &lt;a href="https://docs.launchdarkly.com/home/flags/variations" rel="noopener noreferrer"&gt;Variations&lt;/a&gt; are flag values to serve based on &lt;a href="https://docs.launchdarkly.com/home/flags/targeting-rules#creating-targeting-rules" rel="noopener noreferrer"&gt;targeting rules&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;To use LaunchDarkly on the client side, you need to add a JavaScript SDK. We will initialize it with the client-id we copied in the first step of the Flask application setup.&lt;/p&gt;

&lt;p&gt;Client ID is used to handle feature flags on the client side. In order to make any feature flag data available to the client side, we need to enable Client-side SDK availability for that feature flag. &lt;/p&gt;

&lt;p&gt;To enable it, go to the feature dark-theme-button -&amp;gt; Setting tab -&amp;gt; Client-side SDK availability -&amp;gt; Checkbox  &lt;code&gt;SDKs using Client-side ID&lt;/code&gt; and Save changes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jinja"&gt;&lt;code&gt;{% raw %}
&lt;span class="nt"&gt;&amp;lt;script &lt;/span&gt;&lt;span class="na"&gt;crossorigin=&lt;/span&gt;&lt;span class="s"&gt;"anonymous"&lt;/span&gt; &lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"https://unpkg.com/launchdarkly-js-client-sdk@2"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/script&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;script&amp;gt;&lt;/span&gt;
        &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;ldclient&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;LDClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;initialize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="cp"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'LD_FRONTEND_KEY'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="cp"&gt;}}&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="cp"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;user_context&lt;/span&gt; &lt;span class="o"&gt;| &lt;/span&gt;&lt;span class="nf"&gt;safe&lt;/span&gt; &lt;span class="cp"&gt;}}&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;bootstrap&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="cp"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;all_flags&lt;/span&gt; &lt;span class="o"&gt;| &lt;/span&gt;&lt;span class="nf"&gt;safe&lt;/span&gt; &lt;span class="cp"&gt;}}&lt;/span&gt;
        &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;renderButton&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
         &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;showFeature&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;ldclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;variation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;dark-theme-button&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
         &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;displayWidget&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getElementById&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;dark-theme-button&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
         &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;displayWidget&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
             &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;showFeature&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                 &lt;span class="nx"&gt;displayWidget&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;style&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;display&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;block&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
             &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                 &lt;span class="nx"&gt;displayWidget&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;style&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;display&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;none&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
             &lt;span class="p"&gt;}&lt;/span&gt;
         &lt;span class="p"&gt;}&lt;/span&gt;

   &lt;span class="nx"&gt;ldclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;waitForInitialization&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
         &lt;span class="nf"&gt;renderButton&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="nx"&gt;ldclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;change&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;renderButton&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
{% endraw %}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, test the feature flag you just created. Once you toggle it on. There will be a button on the left corner.&lt;/p&gt;

&lt;p&gt;&lt;a href="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/Flask-Application-Dark-theme-Button.png" class="article-body-image-wrapper"&gt;&lt;img src="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/Flask-Application-Dark-theme-Button.png" alt="Dark Theme Button"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you toggle that button, it should appear like this.&lt;/p&gt;

&lt;p&gt;&lt;a href="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/dark-theme-button-toggle-on.png" class="article-body-image-wrapper"&gt;&lt;img src="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/dark-theme-button-toggle-on.png" alt="Dark Theme Button Toggle On"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Use Case #2 Logging Level Feature Flag
&lt;/h3&gt;

&lt;p&gt;Context: Your development team is facing an issue in debugging an application. However, you have implemented debug logs throughout the application, but you can’t switch the logger level while the application is running. If you do it via the environment variable, it will still require you to restart the application. Is there any other way you can do it? &lt;/p&gt;

&lt;p&gt;Yes, you can add a flag that can define the logger level before any requests come in, and you can operate it remotely. Flask API provides us with &lt;em&gt;before_request&lt;/em&gt; to register any function, and it will run before each request.  See the below example.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="nd"&gt;@app.before_request&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;setLoggingLevel&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;flask&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;
        &lt;span class="n"&gt;logLevel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ldclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;variation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;set-logging-level&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
             &lt;span class="nf"&gt;get_ld_non_human_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
             &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;INFO&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Log level: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;logLevel&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setLevel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;logLevel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getLogger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;werkzeug&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;setLevel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;logLevel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getLogger&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;setLevel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;logLevel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Note: In the above, I’m providing three things to &lt;code&gt;ldclient.variation()&lt;/code&gt;: 1. Flag key 2. User context 3. Default value.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;To add that feature flag to the LaunchDarkly account, go to &lt;em&gt;Feature Flags&lt;/em&gt; on the left side panel.&lt;/p&gt;

&lt;p&gt;Click &lt;em&gt;Create Flag&lt;/em&gt; and Fill these values in.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Name - Logging Level&lt;/li&gt;
&lt;li&gt;Key - set-logging-level&lt;/li&gt;
&lt;li&gt;Flag Variation Type - Number&lt;/li&gt;
&lt;li&gt;Variation 1 - 10&lt;/li&gt;
&lt;li&gt;Variation 2 - 20&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Note: Make sure every feature flag should be in the same environment as the keys you used to set up LaunchDarkly Client in the application.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Now, go to &lt;code&gt;http://localhost:5000/&lt;/code&gt; and see the logs in the terminal of your running application.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;127.0.0.1 - - &lt;span class="o"&gt;[&lt;/span&gt;29/Sep/2022 14:49:47] &lt;span class="s2"&gt;"GET / HTTP/1.1"&lt;/span&gt; 200 -
INFO:werkzeug:127.0.0.1 - - &lt;span class="o"&gt;[&lt;/span&gt;29/Sep/2022 14:49:47] &lt;span class="s2"&gt;"GET / HTTP/1.1"&lt;/span&gt; 200 -
&lt;span class="o"&gt;[&lt;/span&gt;2022-09-29 14:49:47,972] INFO &lt;span class="k"&gt;in &lt;/span&gt;run: Log level: 20
INFO:arun:Log level: 20
127.0.0.1 - - &lt;span class="o"&gt;[&lt;/span&gt;29/Sep/2022 14:49:47] &lt;span class="s2"&gt;"GET /static/css/custom.css HTTP/1.1"&lt;/span&gt; 200 -
INFO:werkzeug:127.0.0.1 - - &lt;span class="o"&gt;[&lt;/span&gt;29/Sep/2022 14:49:47] &lt;span class="s2"&gt;"GET /static/css/custom.css HTTP/1.1"&lt;/span&gt; 200 -
&lt;span class="o"&gt;[&lt;/span&gt;2022-09-29 14:49:47,983] INFO &lt;span class="k"&gt;in &lt;/span&gt;run: Log level: 20
INFO:app.run:Log level: 20
127.0.0.1 - - &lt;span class="o"&gt;[&lt;/span&gt;29/Sep/2022 14:49:47] &lt;span class="s2"&gt;"GET /static/js/dark-mode.js HTTP/1.1"&lt;/span&gt; 304 -
INFO:werkzeug:127.0.0.1 - - &lt;span class="o"&gt;[&lt;/span&gt;29/Sep/2022 14:49:47] &lt;span class="s2"&gt;"GET /static/js/dark-mode.js HTTP/1.1"&lt;/span&gt; 304 -
&lt;span class="o"&gt;[&lt;/span&gt;2022-09-29 14:49:48,848] INFO &lt;span class="k"&gt;in &lt;/span&gt;run: Log level: 20
INFO:app.run:Log level: 20
127.0.0.1 - - &lt;span class="o"&gt;[&lt;/span&gt;29/Sep/2022 14:49:48] &lt;span class="s2"&gt;"GET /favicon.ico HTTP/1.1"&lt;/span&gt; 404 -
INFO:werkzeug:127.0.0.1 - - &lt;span class="o"&gt;[&lt;/span&gt;29/Sep/2022 14:49:48] &lt;span class="s2"&gt;"GET /favicon.ico HTTP/1.1"&lt;/span&gt; 404 -
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you see your level is 20 as of now because the feature flag is not turned on. Now, go back to LaunchDarkly and turn on the flag via a toggle on the right side of it.&lt;/p&gt;

&lt;p&gt;&lt;a href="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/Logging-Level-Flag.png" class="article-body-image-wrapper"&gt;&lt;img src="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/Logging-Level-Flag.png" alt="Logging Level"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, recheck the logs by going to the homepage of the local application.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;INFO:werkzeug:127.0.0.1 - - &lt;span class="o"&gt;[&lt;/span&gt;29/Sep/2022 14:55:11] &lt;span class="s2"&gt;"GET / HTTP/1.1"&lt;/span&gt; 200 -
DEBUG:root:&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s1"&gt;'key'&lt;/span&gt;: &lt;span class="s1"&gt;'sudhanshu'&lt;/span&gt;, &lt;span class="s1"&gt;'ip'&lt;/span&gt;: &lt;span class="s1"&gt;'127.0.0.1'&lt;/span&gt;, &lt;span class="s1"&gt;'email'&lt;/span&gt;: &lt;span class="s1"&gt;'local@machine.com'&lt;/span&gt;, &lt;span class="s1"&gt;'custom'&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s1"&gt;'type'&lt;/span&gt;: &lt;span class="s1"&gt;'machine'&lt;/span&gt;&lt;span class="o"&gt;}}&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;2022-09-29 14:55:12,018] INFO &lt;span class="k"&gt;in &lt;/span&gt;run: Log level: 10
INFO:app.run:Log level: 10
127.0.0.1 - - &lt;span class="o"&gt;[&lt;/span&gt;29/Sep/2022 14:55:12] &lt;span class="s2"&gt;"GET /static/js/dark-mode.js HTTP/1.1"&lt;/span&gt; 304 -
INFO:werkzeug:127.0.0.1 - - &lt;span class="o"&gt;[&lt;/span&gt;29/Sep/2022 14:55:12] &lt;span class="s2"&gt;"GET /static/js/dark-mode.js HTTP/1.1"&lt;/span&gt; 304 -
DEBUG:root:&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s1"&gt;'key'&lt;/span&gt;: &lt;span class="s1"&gt;'sudhanshu'&lt;/span&gt;, &lt;span class="s1"&gt;'ip'&lt;/span&gt;: &lt;span class="s1"&gt;'127.0.0.1'&lt;/span&gt;, &lt;span class="s1"&gt;'email'&lt;/span&gt;: &lt;span class="s1"&gt;'local@machine.com'&lt;/span&gt;, &lt;span class="s1"&gt;'custom'&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s1"&gt;'type'&lt;/span&gt;: &lt;span class="s1"&gt;'machine'&lt;/span&gt;&lt;span class="o"&gt;}}&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;2022-09-29 14:55:12,027] INFO &lt;span class="k"&gt;in &lt;/span&gt;run: Log level: 10
INFO:app.run:Log level: 10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should able to see the log level 10 debug logs just by turning on the toggle from LaunchDarkly platform. If in the future, you’d want to turn on the debug log, it will be just a toggle away with no need for an application restart.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use Case #3  Adding a new field in API response
&lt;/h3&gt;

&lt;p&gt;Context: API team developer wants to add a new field in API response, i.e., &lt;code&gt;count&lt;/code&gt;. This field will help end users get a count of the number of products returned in an API response. Now, the API team lead decided first to validate latency in API response, whether it is in a reasonable range, and roll it out to a few beta users so that they can get their feedback before rolling it out to everyone.&lt;/p&gt;

&lt;p&gt;You can see how I’m evaluating a feature flag using &lt;em&gt;ldclient&lt;/em&gt; to get a current variation of a flag with a default value. Just for the sake of simplicity, this is how I’m implementing this in the Flask application.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@api.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/fashion&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GET&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="nd"&gt;@token_required&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;list_fashion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_user&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# add a additional field in api response with feature flag
&lt;/span&gt;    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;      
        &lt;span class="n"&gt;query_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Products&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter_by&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                       &lt;span class="n"&gt;product_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;fashion&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;product_schema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ProductSchema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;many&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;successfully retrieved all products&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;product_schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="c1"&gt;# Feature flag to add a field in api response
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;current_app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ldclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;variation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                               &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;add-field-total&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      
                               &lt;span class="n"&gt;current_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_ld_user&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)})&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;jsonify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;current_app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;debug&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                               &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Something went wrong: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                               &lt;span class="n"&gt;exc_info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;jsonify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;failed to retrieve all products&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
        &lt;span class="p"&gt;}),&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before hitting the request, you need to generate an API token of our application. To generate one, use this curl command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;--location&lt;/span&gt; &lt;span class="nt"&gt;--request&lt;/span&gt; POST &lt;span class="s1"&gt;'localhost:5000/api/login'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s1"&gt;'Content-Type: application/json'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--data-raw&lt;/span&gt; &lt;span class="s1"&gt;'{
    "email" : "example@something.com",
    "password" : "12345"
}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once you run that command copy the token value, we will need it in further steps.&lt;br&gt;
Now, see the response using the below curl command. You'll see there is no &lt;em&gt;count&lt;/em&gt; in the API response.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;--location&lt;/span&gt; &lt;span class="nt"&gt;--request&lt;/span&gt; GET &lt;span class="s1"&gt;'localhost:5000/api/fashion'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s1"&gt;'Authorization: token PUT_TOKEN_HERE’
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;{&lt;/span&gt;  
    &lt;span class="s2"&gt;"data"&lt;/span&gt;: &lt;span class="o"&gt;[&lt;/span&gt;...],
    &lt;span class="s2"&gt;"message"&lt;/span&gt;: &lt;span class="s2"&gt;"successfully retrieved all products"&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, we will create a feature flag in LaunchDarkly using the same flow as we did earlier. Using these Values.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Name - Add field 'count' in API response.&lt;/li&gt;
&lt;li&gt;Key - add-field-total&lt;/li&gt;
&lt;li&gt;Flag Variation Type - Boolean&lt;/li&gt;
&lt;li&gt;Variation 1 - True&lt;/li&gt;
&lt;li&gt;Variation 2 - False&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After you create that flag, you should navigate to the user's tab on the left side of the panel. This user tab helps you find users who have evaluated those flags in that environment.&lt;/p&gt;

&lt;p&gt;&lt;a href="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/Users.png" class="article-body-image-wrapper"&gt;&lt;img src="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/Users.png" alt="Users"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Before we turn on that created feature flag, let’s talk about the dialog box we see whenever we turn in any feature flag toggle.&lt;/p&gt;

&lt;p&gt;&lt;a href="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/feature-flag-toggle-dialog-box.png" class="article-body-image-wrapper"&gt;&lt;img src="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/feature-flag-toggle-dialog-box.png" alt="Dialog Box"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Dialog Box&lt;/p&gt;

&lt;p&gt;You would’ve noticed changes options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Schedule&lt;/strong&gt; - This helps you set up an approval flow &amp;amp; schedule to change the state of any flags. This feature is part of their Enterprise plan. Read more about it &lt;a href="https://docs.launchdarkly.com/home/feature-workflows/scheduled-changes" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Targeting&lt;/strong&gt; - Using different &lt;a href="https://docs.launchdarkly.com/home/flags/targeting-users" rel="noopener noreferrer"&gt;targeting rules&lt;/a&gt;, we can specify which user should receive what variation. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now, we will look into targeting and how we can leverage it to release a feature to specific users.&lt;/p&gt;

&lt;h4&gt;
  
  
  Using User Targeting in LaunchDarkly
&lt;/h4&gt;

&lt;p&gt;To use targeting, you need to go into  &lt;em&gt;Feature Flag&lt;/em&gt; -&amp;gt;  &amp;lt;&lt;em&gt;Feature Flag Name&lt;/em&gt;&amp;gt; -&amp;gt; &lt;em&gt;Targeting tab&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/User-Targeting.png" class="article-body-image-wrapper"&gt;&lt;img src="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/User-Targeting.png" alt="User Targeting"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Create any user from this page &lt;a href="http://localhost:5000/register" rel="noopener noreferrer"&gt;http://localhost:5000/register&lt;/a&gt; and add this user in Feature flag-&amp;gt; Feature Name -&amp;gt; Individual targeting section. One user in &lt;code&gt;True&lt;/code&gt; Variation and another user in &lt;code&gt;False&lt;/code&gt; Variation.&lt;/p&gt;

&lt;p&gt;&lt;a href="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/targeting-rules.png" class="article-body-image-wrapper"&gt;&lt;img src="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/targeting-rules.png" alt="Targeting Users"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, before calling this &lt;a href="http://localhost:5000/api/fashion" rel="noopener noreferrer"&gt;http://localhost:5000/api/fashion&lt;/a&gt;, you need to create a token for this user as well. And use the same curl command to get a list of products we used in the earlier step.&lt;/p&gt;

&lt;p&gt;Make an API call using those commands for two different users. You will see API is returning two different schemas of response. One contains &lt;code&gt;count&lt;/code&gt;, and the other doesn’t because you only released that feature to one user; for the other user it is still the same.&lt;/p&gt;

&lt;p&gt;&lt;a href="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/api-response-after-targeting-users.png" class="article-body-image-wrapper"&gt;&lt;img src="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/api-response-after-targeting-users.png" alt="API response comparision for two users"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Use Case #4 Disable Registration Page
&lt;/h3&gt;

&lt;p&gt;Context: During a sale, we get huge traffic for new users; sometimes, it can be overwhelming to control such a situation. Even though it is good for business, we’re getting new customers, but the sheer amount of load can be a bad experience for your loyal registered users.&lt;/p&gt;

&lt;p&gt;For those who are paying money to get better &amp;amp; faster service.&lt;/p&gt;

&lt;p&gt;Below is an example of the HM.com website in maintenance mode. Ideally, it should not happen during your peak sales hours, but sometimes you need to calibrate the inventory before the sales begin. Or sometime you just want to allow only pre-registered customers to have access to the sale. Similar story with Product Hunt.&lt;/p&gt;

&lt;p&gt;&lt;a href="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/temporary-maintenance-window.png" class="article-body-image-wrapper"&gt;&lt;img src="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/temporary-maintenance-window.png" alt="Sites temporarily closed"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this case, you’re just disabling registration for a few minutes, allowing registered users first. You might be wondering is this kind of control behavior possible where no new user register for some time? Yes, It is. See the below code. I’ve created a flag called &lt;em&gt;disable-registration&lt;/em&gt;; the default value is false. And once you turn it on, it will redirect all users back home with a message.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@core.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/register&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GET&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;current_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_authenticated&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;redirect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;url_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;core.index&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;current_app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ldclient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;variation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;disable-registration&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  
                                  &lt;span class="n"&gt;current_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_ld_user&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;flash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Not accepting new registration, try after sometime&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;redirect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;url_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;core.index&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;method&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;form&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;userEmail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter_by&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;form&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;userEmail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;first&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;flash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Email is already taken. Please choose another email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;redirect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;url_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;core.register&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;form&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inputPassword&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;form&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confirmPassword&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="nf"&gt;flash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Passwords must match&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;redirect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;url_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;core.register&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_password&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;form&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inputPassword&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nf"&gt;flash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Congratulations, you are now a registered user!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;login_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;redirect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;url_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;core.dashboard&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;render_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;register.html&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Follow the same steps as we did earlier to create a feature flag. Use provided values.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Name - Disable New Registration&lt;/li&gt;
&lt;li&gt;Key - disable-registration&lt;/li&gt;
&lt;li&gt;Flag Variation Type - Boolean&lt;/li&gt;
&lt;li&gt;Variation 1 - True&lt;/li&gt;
&lt;li&gt;Variation 2 - False&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once you turn it on, the register page will stop accepting any registration. Try going to this URL &lt;a href="http://localhost:5000/register" rel="noopener noreferrer"&gt;http://localhost:5000/register&lt;/a&gt;. It should redirect you back to the home page.&lt;/p&gt;

&lt;p&gt;&lt;a href="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/disable-registration-feature-flag.png" class="article-body-image-wrapper"&gt;&lt;img src="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/disable-registration-feature-flag.png" alt="Disable New Registration Feature Flag"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Disable New Registration flag is turned on.&lt;/p&gt;

&lt;p&gt;&lt;a href="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/disable-registration-message.png" class="article-body-image-wrapper"&gt;&lt;img src="/assets/img/Blog/feature-flag-implementation-using-launchdarkly/disable-registration-message.png" alt="Disable Registration Message"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt; &lt;em&gt; After Disable Registration flag is turned on.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This Flask demo application has many such feature flags to explore if you want to. I’ve provided a list of those flags and the configuration of those flags in the readme of the application repository; you should see them in action. Though there are several features that come under the enterprise plan, which I couldn’t demo in this blog post, however, you can get a clear picture of more features from &lt;a href="https://docs.launchdarkly.com/home/feature-workflows/workflows?q=worklows" rel="noopener noreferrer"&gt;LaunchDarkly documentation&lt;/a&gt;. LaunchDarkly has very detailed and easy-to-understand documentation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this blog post, we looked at the benefits and drawbacks of using feature flags. On a day-to-day basis, how it can be useful for any team to have control of the feature they release. How decoupling between deployment and release is increasing the productivity of the developer. Feature flags have helped many companies (including Facebook, and Instagram, to name a few). Most companies generally release their features geographically and through user segmentation. Hence, having a feature flag management like LaunchDarkly becomes a need. &lt;/p&gt;

&lt;p&gt;If you’re looking for experts who can help you build a great product and optimize your infrastructure to be reliable, explore why startups and enterprises consider us as their &lt;a href="https://dev.to/cloud-native-product-development/"&gt;cloud native product engineering experts&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://martinfowler.com/articles/feature-toggles.html#FeatureTogglesIntroduceValidationComplexity" rel="noopener noreferrer"&gt;Feature Toggles (aka Feature Flags)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://launchdarkly.com/blog/what-are-feature-flags/" rel="noopener noreferrer"&gt;Feature flags, what are they? - LaunchDarkly&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.launchdarkly.com/home" rel="noopener noreferrer"&gt;LaunchDarkly docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.atlassian.com/continuous-delivery/principles/feature-flags" rel="noopener noreferrer"&gt;Feature Flags - Atlassian&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://featureflags.io/" rel="noopener noreferrer"&gt;FeatureFlags&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://flagsmith.com/blog/decoupling-deployment-from-release-with-feature-flags/" rel="noopener noreferrer"&gt;Decoupling Deployment from Release with Feature Flags - Flagsmith&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://trunkbaseddevelopment.com/" rel="noopener noreferrer"&gt;Trunk Based Development&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>devops</category>
      <category>tutorial</category>
      <category>tooling</category>
    </item>
    <item>
      <title>Running Phi 3 with vLLM and Ray Serve</title>
      <dc:creator>Sudhanshu Prajapati</dc:creator>
      <pubDate>Fri, 08 Nov 2024 10:46:38 +0000</pubDate>
      <link>https://dev.to/infracloud/running-phi-3-with-vllm-and-ray-serve-4g0f</link>
      <guid>https://dev.to/infracloud/running-phi-3-with-vllm-and-ray-serve-4g0f</guid>
      <description>&lt;p&gt;While everyone is talking about new models and their possible use cases, their deployment aspect often gets overlooked. The journey from a trained model to a production-ready service is a complex and nuanced process that deserves more attention. From the perspective of a web API server, when a developer needs to access information like user profiles or services, we typically create a REST API service that interacts with the database. This API service also handles business logic, enabling the system to process and serve thousands of requests per minute efficiently. However, it is different when we talk about serving models.&lt;/p&gt;

&lt;p&gt;In the pre-production phase, data scientists and machine learning (ML) engineers often test their models locally, loading model weights onto a Compute Unified Device Architecture (CUDA) device using ML libraries like PyTorch to showcase accuracy. While this local execution works excellently for testing, scaling that same model to handle real-time, production-level traffic is an entirely different challenge. Many engineers consider serving the model by wrapping it in a &lt;a href="https://flask.palletsprojects.com/en/3.0.x/" rel="noopener noreferrer"&gt;Flask microservice&lt;/a&gt;. Though the Flask microservice is a simple solution, it quickly becomes unmanageable when dealing with multiple models and serving on a scale.&lt;/p&gt;

&lt;p&gt;Additionally, &lt;a href="https://www.infracloud.io/webinars/bringing-observability-to-complex-ai-platforms-and-models/" rel="noopener noreferrer"&gt;monitoring the performance of a model&lt;/a&gt; in production is quite different from monitoring the performance of traditional API servers. Inference requires specialized monitoring for aspects like latency, GPU utilization, and throughput—less relevant metrics for typical API services. This is where &lt;a href="https://www.infracloud.io/blogs/running-llama-3-with-triton-tensorrt-llm/" rel="noopener noreferrer"&gt;inference servers&lt;/a&gt; come into play and provide specialized servers for model serving.&lt;/p&gt;

&lt;p&gt;In this blog post, we will delve into the differences between inference and serving and explore how to deploy the Phi-3 model using vLLM with Ray Serve on Kubernetes, a general-purpose scalable serving layer built on top of Ray.&lt;/p&gt;

&lt;h2&gt;
  
  
  Inference and Serving
&lt;/h2&gt;

&lt;p&gt;Before diving into the details of inference and serving, it's important to understand how they fit into the broader MLOps cycle. MLOps, or Machine Learning Operations, is a set of practices that aim to automate and streamline the process of deploying and maintaining machine learning models in production. It draws parallels to DevOps but specifically focuses on the challenges unique to machine learning.&lt;/p&gt;

&lt;p&gt;The MLOps cycle typically involves several stages, from data collection and model development to deployment, monitoring, and continuous improvement. If you’re new to the concept, I recommend checking out our detailed &lt;a href="https://www.infracloud.io/blogs/introduction-to-mlops/" rel="noopener noreferrer"&gt;introduction to MLOps&lt;/a&gt; for a comprehensive overview.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl4vd3i3gy0108xfw5v4k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl4vd3i3gy0108xfw5v4k.png" alt="Machine Learning Lifecycle" width="800" height="332"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://medium.com/distributed-computing-with-ray/machine-learning-serving-is-broken-f59aff2d607f" rel="noopener noreferrer"&gt;(Image Source)&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;In this cycle, &lt;strong&gt;inference&lt;/strong&gt; and &lt;strong&gt;serving&lt;/strong&gt; come into play in the latter half once a model has been trained and is ready for deployment. Though these terms are often used interchangeably, they refer to different stages in the lifecycle of a model in production.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is inference?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Inference&lt;/strong&gt; is when a trained model takes input data and produces predictions or outputs. In simpler terms, the actual computation happens when a model is asked to generate a result—like classifying an image, translating text, or generating a response in a chatbot. Inference happens locally when testing the model, often using a framework like PyTorch or TensorFlow, and can be run on either CPUs or GPUs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fga6052vw1m3xmc4z6qsv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fga6052vw1m3xmc4z6qsv.png" alt="Inference" width="800" height="333"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://medium.com/distributed-computing-with-ray/machine-learning-serving-is-broken-f59aff2d607f" rel="noopener noreferrer"&gt;(Image Source)&lt;/a&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  What is model serving?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Serving&lt;/strong&gt;, on the other hand, refers to making the model accessible as a service. This involves deploying the model in a way that allows it to handle real-time requests, often at scale. When a model is served, it’s not just about running inference but doing so in an optimized, scalable, and monitored environment where it can respond to multiple requests from users or applications in real time. Serving requires integrating the model with APIs, managing resources like GPU/CPU, and ensuring the service is stable and performant over time.&lt;/p&gt;

&lt;p&gt;Since we’re talking about deploying &lt;a href="https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/" rel="noopener noreferrer"&gt;Phi-3&lt;/a&gt;, a large language model, we will take a look at specialized servers that allow us to deploy LLMs. Some of them are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;vLLM: This works as an inference engine and an inference server, allowing you to run LLMs that are supported by it.&lt;/li&gt;
&lt;li&gt;Ray Serve: It is a framework-agnostic serving library, and enables you to use the same framework you trained your model in, reducing the need to convert to a specific format.&lt;/li&gt;
&lt;li&gt;TensorRT LLM: It is a specialized inference server for TensorRT-trained models, or to run any other model, you would need to convert into the format that TensorRT LLM supports.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the scope of this blog, we will be using vLLM as an inference engine, and Ray Serve as a serving library. You can read more about inference servers in our blog post, where we &lt;a href="https://www.infracloud.io/blogs/exploring-ai-model-inference/" rel="noopener noreferrer"&gt;explored AI Model Inference: servers, frameworks, and optimization strategies&lt;/a&gt; for detailed understanding.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is vLLM?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/vllm-project/vllm" rel="noopener noreferrer"&gt;vLLM&lt;/a&gt; stands for virtual large language models. It is one of the open source &lt;a href="https://blog.vllm.ai/2023/06/20/vllm.html" rel="noopener noreferrer"&gt;fast inferencing&lt;/a&gt; and serving libraries. As the name suggests, ‘virtual’ encapsulates the concept of virtual memory and paging from operating systems, which allows addressing the problem of maximum utilization of resources and providing faster token generation by utilizing &lt;a href="https://blog.vllm.ai/2023/06/20/vllm.html" rel="noopener noreferrer"&gt;PagedAttention&lt;/a&gt;. Traditional LLM serving involves storing large attention keys and value tensors in GPU memory, leading to inefficient memory usage.&lt;/p&gt;

&lt;p&gt;LMSYS, or Large Model Systems Organization, adopted vLLM to power &lt;a href="https://chat.lmsys.org" rel="noopener noreferrer"&gt;Chatbot Arena and Vicuna Demo&lt;/a&gt;, handling significantly significant traffic and reducing operational costs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why vLLM?
&lt;/h3&gt;

&lt;p&gt;vLLM is a specialized and efficient library for large language models (LLMs) with several advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Open source and highly adaptable&lt;/strong&gt;: It’s an open source library, making it flexible and accessible for various use cases.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Broad model support&lt;/strong&gt;: It supports a wide range of model architectures, which you can explore further in the &lt;a href="https://docs.vllm.ai/en/latest/models/supported_models.html" rel="noopener noreferrer"&gt;official vLLM documentation.&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Advanced monitoring and GPU support&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;Compatible with multiple GPU platforms, such as NVIDIA and AMD GPUs.&lt;/li&gt;
&lt;li&gt;Includes monitoring capabilities to track and manage model performance.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Scalability&lt;/strong&gt;: vLLM comes with built-in scaling mechanisms to handle large models effectively, such as tensor parallelism, pipeline parallelism, and distributed inference.&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Lightweight&lt;/strong&gt;: Despite its powerful features, vLLM remains a lightweight library, making it a strong choice for efficient performance.&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Constantly improving&lt;/strong&gt;: The toolkit continually evolves, with frequent updates and new features added.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;For the purpose of this blog, we’ll be using vLLM specifically for the &lt;strong&gt;inference&lt;/strong&gt; phase. Next, let’s see how &lt;strong&gt;Ray Serve&lt;/strong&gt; fits into the picture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where does Ray Serve and KubeRay fit in Kubernetes?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://docs.ray.io/en/latest/index.html" rel="noopener noreferrer"&gt;Ray&lt;/a&gt; is an open source unified framework for AI and Python applications built around the idea of simplified distributed computing. It allows users to run tasks in parallel across multiple nodes or machines, making it ideal for distributed machine learning, reinforcement learning, or parallel processing. One of Ray’s standout features is its high-level libraries, one of them is Ray Serve, designed to streamline model serving for machine learning applications. You can learn in detail in &lt;a href="https://www.infracloud.io/blogs/distributed-parallel-processing-ray-kuberay/" rel="noopener noreferrer"&gt;Primer on Distributed Parallel Processing with Ray&lt;/a&gt; blog post.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ray Serve
&lt;/h3&gt;

&lt;p&gt;As we discussed, the serving model differs from traditional web servers. So now, when we have specialized model servers, like TensorFlow, ONNX Runtime, TensorRT, etc, they package existing models and serve them in their APIs. These model servers need more flexibility due to their specialized APIs. A developer or data scientist must deal with two servers: a model and a web server containing the business logic. Also, not to forget about vendor lock-in and conversion we have to do to serve them on those servers, adding a step in the process.&lt;/p&gt;

&lt;p&gt;This is where Ray Serve helps. It allows you to contain business logic and model inference in the same place, with end-to-end control over the request lifecycle while letting each model scale independently. It supports multi-model serving, traffic splitting, and version control, enabling developers to route requests to specific models or model versions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2dbyoxyl9qef2beik45e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2dbyoxyl9qef2beik45e.png" alt="Ray Serve" width="800" height="224"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.anyscale.com/blog/why-you-should-build-your-ai-applications-with-ray" rel="noopener noreferrer"&gt;(Image Source)&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Ray Serve’s integration with Ray’s distributed framework allows models to be served without having to rewrite the entire application. The Ray Serve library also gets the features that Ray framework provides, such as easily scaling to many machines and flexible scheduling support, such as fractional GPUs, which in turn lowers the operation cost. You can read more about Ray Serve &lt;a href="https://docs.ray.io/en/latest/serve/key-concepts.html" rel="noopener noreferrer"&gt;key concepts &lt;/a&gt; and &lt;a href="https://docs.ray.io/en/latest/serve/getting_started.html" rel="noopener noreferrer"&gt;features &amp;amp; use cases&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  KubeRay
&lt;/h3&gt;

&lt;p&gt;KubeRay enables you to run Ray applications on Kubernetes since Ray Serve deployments and applications are more like Ray Applications. KubeRay helps deploy such applications using Custom Resource Definitions.&lt;/p&gt;

&lt;p&gt;It includes three CRDs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;RayCluster&lt;/strong&gt;: Manages the lifecycle of Ray clusters, specifying the configuration for head and worker nodes and resource allocation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RayService&lt;/strong&gt;: This is designed specifically for managing Ray Serve deployments, providing a simple way to configure and deploy serving applications on Ray.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RayJob&lt;/strong&gt;: Allows users to run batch jobs on Ray, enabling the execution of distributed tasks and workflows within Kubernetes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a more detailed exploration of Ray and its capabilities, refer to my previous blog on &lt;a href="https://www.infracloud.io/blogs/distributed-parallel-processing-ray-kuberay/" rel="noopener noreferrer"&gt;Ray on Kubernetes using KubeRay&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;So, what we’re more interested in the scope of this blog is RayService CRD.&lt;/p&gt;

&lt;h4&gt;
  
  
  RayService
&lt;/h4&gt;

&lt;p&gt;The RayService CRD allows you to deploy Ray Serve applications seamlessly on Kubernetes. By defining a RayService, you can specify your Ray Serve deployment's parameters, such as the model to be served, scaling options, and routing configurations. This abstraction simplifies the deployment process and allows you to manage your serving infrastructure through Kubernetes.&lt;/p&gt;

&lt;p&gt;Example of a RayService CRD:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;serving.kubray.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;RayService&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;audio-model&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;rayCluster&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-ray-cluster&lt;/span&gt;
  &lt;span class="na"&gt;deployment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AudioModel&lt;/span&gt;
    &lt;span class="na"&gt;routePrefix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/audio"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this example, the RayService CRD defines a deployment for the &lt;code&gt;AudioModel&lt;/code&gt;, specifying that three replicas should be created to handle incoming requests at the &lt;code&gt;/audio&lt;/code&gt; endpoint. This structure simplifies the deployment and integrates with Kubernetes' existing capabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Serving Model on Kubernetes
&lt;/h2&gt;

&lt;p&gt;In this implementation, we will be deploying &lt;a href="https://huggingface.co/microsoft/Phi-3-mini-4k-instruct" rel="noopener noreferrer"&gt;phi-3-mini-4k-instruct&lt;/a&gt; model by Microsoft using vLLM as an inference engine and Ray Serve for serving with the help of KubeRay on Kubernetes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F429rruf03yexs3lmeh00.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F429rruf03yexs3lmeh00.png" alt="Serving Model on Kubernetes" width="800" height="886"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;p&gt;To get this working, we will need the following things beforehand.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;kubectl: Make sure you have kubectl installed on your local system.&lt;/li&gt;
&lt;li&gt;Kubernetes Cluster: It should have at least two worker nodes with 1 CPU node and 1 GPU node.

&lt;ul&gt;
&lt;li&gt;Make sure the GPU node is tainted.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Ray Serve library (optional): It is not required per se, but for local testing, it should be present.&lt;/li&gt;

&lt;li&gt;Helm: It will be used for installing charts.&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Setting up
&lt;/h3&gt;

&lt;p&gt;1.Install KubeRay via Helm on Kubernetes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   helm repo add kuberay https://ray-project.github.io/kuberay-helm/
   helm repo update
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   helm &lt;span class="nb"&gt;install &lt;/span&gt;kuberay-operator kuberay/kuberay-operator &lt;span class="nt"&gt;--version&lt;/span&gt; 1.2.1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   NAME: kuberay-operator
   LAST DEPLOYED: Fri Sep 20 07:44:00 2024
   NAMESPACE: default
   STATUS: deployed
   REVISION: 1
   TEST SUITE: None
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;2.Now, create a Ray Serve application.&lt;br&gt;
   We will wrap the deployment and serve in the same Python Class, VLLMInference. The vLLM engine will be created during initialization, and the tokenizer will be loaded. Upon getting a request on the REST API endpoint /generate, it will use the vLLM-provided chat template and pass the prompt self.engine.generate, which will queue the request if other requests are still being processed. Lastly, the Custom GenerateResponse PyDantic model will revert responses in a specified format.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;   &lt;span class="nd"&gt;@serve.deployment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;VLLMInference&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                   &lt;span class="n"&gt;num_replicas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                   &lt;span class="n"&gt;max_concurrent_queries&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                   &lt;span class="n"&gt;ray_actor_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;num_gpus&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
                   &lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="nd"&gt;@serve.ingress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;VLLMInference&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
       &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
           &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
           &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AsyncEngineArgs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
           &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AsyncLLMEngine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_engine_args&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
           &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_prepare_tokenizer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;


       &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_prepare_tokenizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,):&lt;/span&gt;
           &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;
           &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;trust_remote_code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
               &lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trust_remote_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
           &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
               &lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
           &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;


       &lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/generate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;GenerateResponse&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;GenerateRequest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;raw_request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;GenerateResponse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
           &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Received request: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
           &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
               &lt;span class="n"&gt;generation_args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exclude&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
               &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;generation_args&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                   &lt;span class="c1"&gt;# Default value
&lt;/span&gt;                   &lt;span class="n"&gt;generation_args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                       &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                       &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                   &lt;span class="p"&gt;}&lt;/span&gt;

               &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                   &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;
               &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;


                   &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply_chat_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                       &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                       &lt;span class="n"&gt;tokenize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                       &lt;span class="n"&gt;add_generation_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
                   &lt;span class="p"&gt;)&lt;/span&gt;
               &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                   &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Prompt or Messages is required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


               &lt;span class="n"&gt;sampling_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SamplingParams&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;generation_args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


               &lt;span class="n"&gt;request_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_next_request_id&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

               &lt;span class="n"&gt;results_generator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sampling_params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


               &lt;span class="n"&gt;final_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
               &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results_generator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                   &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;raw_request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_disconnected&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
                       &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                       &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;GenerateResponse&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                   &lt;span class="n"&gt;final_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;  &lt;span class="c1"&gt;# Store the last result
&lt;/span&gt;               &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;final_result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;GenerateResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;final_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                           &lt;span class="n"&gt;finish_reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;final_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;finish_reason&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                           &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;final_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
               &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                   &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No results found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
           &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;ValueError&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
               &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;HTTPException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HTTPStatus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;BAD_REQUEST&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
           &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
               &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Error in generate()&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc_info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
               &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;HTTPException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HTTPStatus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;INTERNAL_SERVER_ERROR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Server error&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


       &lt;span class="nd"&gt;@staticmethod&lt;/span&gt;
       &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_next_request_id&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
           &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid1&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nb"&gt;hex&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


       &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_abort_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
           &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


       &lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/health&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;health&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
           &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Health check.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
           &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

   &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;deployment_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Application&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
       &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;VLLMInference&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the Ray Serve application is ready, push it to the repository.&lt;/p&gt;

&lt;p&gt;3.Now, let’s define RayService CRD.&lt;/p&gt;

&lt;p&gt;This CRD will help us deploy our Ray Serve application on Kubernetes and configure scaling and other Kubernetes-related parameters.&lt;/p&gt;

&lt;p&gt;Here, I’m providing a name and route, import path, and the location of the binding function within the working directory (i.e., our &lt;a href="https://github.com/infracloudio/ray-serve-demo" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;). I'm also providing a few args accepted by the Ray Serve application.&lt;/p&gt;

&lt;p&gt;These args are helpful in the long run if you want to customize or change any parameter. We won't have to rewrite the whole application; the same CRD with different names and args can be used for various models if supported.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ray.io/v1&lt;/span&gt;
   &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;RayService&lt;/span&gt;
   &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vllm-service&lt;/span&gt;
   &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;serveConfigV2&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
       &lt;span class="s"&gt;applications:&lt;/span&gt;
         &lt;span class="s"&gt;- name: VLLMService&lt;/span&gt;
           &lt;span class="s"&gt;route_prefix: /&lt;/span&gt;
           &lt;span class="s"&gt;import_path: ray-serve.vllm_engine:deployment_llm&lt;/span&gt;
           &lt;span class="s"&gt;runtime_env:&lt;/span&gt;
             &lt;span class="s"&gt;working_dir: "https://github.com/infracloudio/ray-serve-demo/archive/28e409b87d2618cdb6f1a2f9f618b66ca896747e.zip"&lt;/span&gt;
             &lt;span class="s"&gt;pip: [ "git+https://github.com/huggingface/transformers", "pydantic", "vllm", "fastapi", "requests"]&lt;/span&gt;
           &lt;span class="s"&gt;args:&lt;/span&gt;
             &lt;span class="s"&gt;model: microsoft/Phi-3-mini-4k-instruct&lt;/span&gt;
             &lt;span class="s"&gt;trust_remote_code: true&lt;/span&gt;
             &lt;span class="s"&gt;dtype: float16&lt;/span&gt;
     &lt;span class="na"&gt;rayClusterConfig&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;rayVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;2.30.0'&lt;/span&gt; &lt;span class="c1"&gt;# Should match the Ray version in the image of the containers&lt;/span&gt;
       &lt;span class="c1"&gt;######################headGroupSpecs#################################&lt;/span&gt;
       &lt;span class="c1"&gt;# Ray head pod template.&lt;/span&gt;
       &lt;span class="na"&gt;headGroupSpec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="c1"&gt;# The `rayStartParams` are used to configure the `ray start` command.&lt;/span&gt;
         &lt;span class="c1"&gt;# See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.&lt;/span&gt;
         &lt;span class="c1"&gt;# See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.&lt;/span&gt;
         &lt;span class="na"&gt;rayStartParams&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
           &lt;span class="na"&gt;dashboard-host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;0.0.0.0'&lt;/span&gt;
         &lt;span class="c1"&gt;# Pod template&lt;/span&gt;
         &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
           &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
             &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
             &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ray-head&lt;/span&gt;
               &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rayproject/ray-ml:2.30.0&lt;/span&gt;
               &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
               &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;6379&lt;/span&gt;
                 &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gcs&lt;/span&gt;
               &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8265&lt;/span&gt;
                 &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dashboard&lt;/span&gt;
               &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10001&lt;/span&gt;
                 &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;client&lt;/span&gt;
               &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8000&lt;/span&gt;
                 &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;serve&lt;/span&gt;
               &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                 &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/tmp/ray&lt;/span&gt;
                   &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ray-logs&lt;/span&gt;
               &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                 &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                   &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2"&lt;/span&gt;
                   &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8G"&lt;/span&gt;
                 &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                   &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2"&lt;/span&gt;
                   &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8G"&lt;/span&gt;
               &lt;span class="c1"&gt;# Do not enable if the monitoring stack is not there&lt;/span&gt;
               &lt;span class="c1"&gt;# env:&lt;/span&gt;
               &lt;span class="c1"&gt;# - name: RAY_GRAFANA_IFRAME_HOST&lt;/span&gt;
               &lt;span class="c1"&gt;#   value: http://127.0.0.1:3000&lt;/span&gt;
               &lt;span class="c1"&gt;# - name: RAY_GRAFANA_HOST&lt;/span&gt;
               &lt;span class="c1"&gt;#   value: http://prometheus-grafana.prometheus-system.svc:80&lt;/span&gt;
               &lt;span class="c1"&gt;# - name: RAY_PROMETHEUS_HOST&lt;/span&gt;
               &lt;span class="c1"&gt;#   value: http://prometheus-kube-prometheus-prometheus.prometheus-system.svc:9090&lt;/span&gt;
             &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
               &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ray-logs&lt;/span&gt;
                 &lt;span class="na"&gt;emptyDir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;
       &lt;span class="na"&gt;workerGroupSpecs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="c1"&gt;# The pod replicas in this group typed worker&lt;/span&gt;
       &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
         &lt;span class="na"&gt;minReplicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
         &lt;span class="na"&gt;maxReplicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
         &lt;span class="na"&gt;groupName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gpu-group&lt;/span&gt;
         &lt;span class="na"&gt;rayStartParams&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;
         &lt;span class="c1"&gt;# Pod template&lt;/span&gt;
         &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
           &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
             &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
             &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ray-worker&lt;/span&gt;
               &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rayproject/ray-ml:2.30.0&lt;/span&gt;
               &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                 &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                   &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt;
                   &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;16G"&lt;/span&gt;
                   &lt;span class="na"&gt;nvidia.com/gpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
                 &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                   &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
                   &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;12G"&lt;/span&gt;
                   &lt;span class="na"&gt;nvidia.com/gpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
             &lt;span class="c1"&gt;# Please add the following taints to the GPU node.&lt;/span&gt;
             &lt;span class="na"&gt;tolerations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
               &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nvidia.com/gpu"&lt;/span&gt;
                 &lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Equal"&lt;/span&gt;
                 &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;present"&lt;/span&gt;
                 &lt;span class="na"&gt;effect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NoSchedule"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the worker's configuration, we have defined the limits, requests, and tolerations in a similar resource format as Kubernetes expects. Tolerations and taint are defined to prevent scheduling CPU-intensive resources on GPU nodes. So, only resources that request GPU and have toleration defined are scheduled on GPU nodes, thus avoiding resource waste.&lt;/p&gt;

&lt;p&gt;Lastly, if you have a monitoring stack in the cluster, only enable the commented part in the Ray head configuration. The source code of the above RayService CRD and Ray Serve application can be found in &lt;a href="https://github.com/infracloudio/ray-serve-demo" rel="noopener noreferrer"&gt;this repository&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;4.Deploying monitoring stack.&lt;/p&gt;

&lt;p&gt;To deploy the monitoring stack, you can use these docs: &lt;a href="https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/prometheus-grafana.html#kuberay-prometheus-grafana" rel="noopener noreferrer"&gt;Using Prometheus and Grafana&lt;/a&gt;. The KubeRay provides an install.sh script to install the Prometheus chart and related custom resources in the namespace prometheus-system automatically. If you don’t have an installed one, this will ease the process of setting up the monitoring stack.&lt;/p&gt;

&lt;p&gt;To install, clone the &lt;a href="https://github.com/ray-project/kuberay" rel="noopener noreferrer"&gt;ray-project/kuberay&lt;/a&gt; and checkout to master. Inside the local repo directory, run the below command.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;    &lt;span class="c"&gt;# Path: kuberay/&lt;/span&gt;
    ./install/prometheus/install.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   $ kuberay git:(master) ./install/prometheus/install.sh
   + set errexit
   + helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
   "prometheus-community" already exists with the same configuration, skipping
   + helm repo update
   Hang tight while we grab the latest from your chart repositories...
   ...Successfully got an update from the "metrics-server" chart repository
   ...Successfully got an update from the "kuberay" chart repository
   ...Successfully got an update from the "prometheus-community" chart repository
   Update Complete. ⎈Happy Helming!⎈
   +++ dirname ./install/prometheus/install.sh
   ++ cd ./install/prometheus
   ++ pwd
   + DIR=/home/sudhanshu/Desktop/workspace/ray-demo/kuberay/install/prometheus
   + helm --namespace prometheus-system install prometheus prometheus-community/kube-prometheus-stack --create-namespace --version 48.2.1 -f /home/sudhanshu/Desktop/workspace/ray-demo/kuberay/install/prometheus/overrides.yaml
   NAME: prometheus
   LAST DEPLOYED: Mon Sep 23 07:53:55 2024
   NAMESPACE: prometheus-system
   STATUS: deployed
   REVISION: 1
   NOTES:
   kube-prometheus-stack has been installed. Check its status by running:
   kubectl --namespace prometheus-system get pods -l "release=prometheus"

   Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create &amp;amp; configure Alertmanager and Prometheus instances using the Operator.
   + monitor_dir=/home/sudhanshu/Desktop/workspace/ray-demo/kuberay/install/prometheus/../../config/prometheus
   + pushd /home/sudhanshu/Desktop/workspace/ray-demo/kuberay/install/prometheus/../../config/prometheus
   ~/Desktop/workspace/ray-demo/kuberay/config/prometheus ~/Desktop/workspace/ray-demo/kuberay
   ++ ls
   + for file in `ls`
   + kubectl apply -f podMonitor.yaml
   podmonitor.monitoring.coreos.com/ray-workers-monitor created
   + for file in `ls`
   + kubectl apply -f rules
   prometheusrule.monitoring.coreos.com/ray-cluster-gcs-rules created
   + for file in `ls`
   + kubectl apply -f serviceMonitor.yaml
   servicemonitor.monitoring.coreos.com/ray-head-monitor created
   + popd
   ~/Desktop/workspace/ray-demo/kuberay
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check all the resources for monitoring up and running.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="nv"&gt;$ &lt;/span&gt; kuberay git:&lt;span class="o"&gt;(&lt;/span&gt;master&lt;span class="o"&gt;)&lt;/span&gt; kubectl get all &lt;span class="nt"&gt;-n&lt;/span&gt; prometheus-system
   NAME                                                         READY   STATUS    RESTARTS  AGE
   pod/alertmanager-prometheus-kube-prometheus-alertmanager-0   2/2     Running   0         114s
   pod/prometheus-grafana-54cddddd76-r8jqp                      3/3     Running   0         2m2s
   pod/prometheus-kube-prometheus-operator-96f59f654-9vbxc      1/1     Running   0         2m2s
   pod/prometheus-kube-state-metrics-786fbd7c69-9xdtk           1/1     Running   0         2m2s
   pod/prometheus-prometheus-kube-prometheus-prometheus-0       2/2     Running   0         113s
   pod/prometheus-prometheus-node-exporter-77kkn                1/1     Running   0         2m2s
   pod/prometheus-prometheus-node-exporter-89dc5                1/1     Running   0         2m2s

   NAME                                             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT&lt;span class="o"&gt;(&lt;/span&gt;S&lt;span class="o"&gt;)&lt;/span&gt;                     AGE
   service/alertmanager-operated                    ClusterIP   None            &amp;lt;none&amp;gt;        9093/TCP,9094/TCP,9094/UDP  115s
   service/prometheus-grafana                       ClusterIP   34.118.226.253  &amp;lt;none&amp;gt;        80/TCP                      2m3s
   service/prometheus-kube-prometheus-alertmanager  ClusterIP   34.118.231.161  &amp;lt;none&amp;gt;        9093/TCP,8080/TCP           2m3s
   service/prometheus-kube-prometheus-operator      ClusterIP   34.118.234.87   &amp;lt;none&amp;gt;        443/TCP                     2m3s
   service/prometheus-kube-prometheus-prometheus    ClusterIP   34.118.236.54   &amp;lt;none&amp;gt;        9090/TCP,8080/TCP           2m3s
   service/prometheus-kube-state-metrics            ClusterIP   34.118.232.116  &amp;lt;none&amp;gt;        8080/TCP                    2m3s
   service/prometheus-operated                      ClusterIP   None            &amp;lt;none&amp;gt;        9090/TCP                    114s
   service/prometheus-prometheus-node-exporter      ClusterIP   34.118.225.149  &amp;lt;none&amp;gt;        9100/TCP                    2m3s

   NAME                                                 DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
   daemonset.apps/prometheus-prometheus-node-exporter   2         2         2       2            2           kubernetes.io/os&lt;span class="o"&gt;=&lt;/span&gt;linux   2m3s

   NAME                                                 READY   UP-TO-DATE   AVAILABLE  AGE
   deployment.apps/prometheus-grafana                   1/1     1           1           2m3s
   deployment.apps/prometheus-kube-prometheus-operator  1/1     1           1           2m3s
   deployment.apps/prometheus-kube-state-metrics        1/1     1           1           2m3s

   NAME                                                             DESIRED   CURRENT   READY   AGE
   replicaset.apps/prometheus-grafana-54cddddd76                    1         1         1       2m3s
   replicaset.apps/prometheus-kube-prometheus-operator-96f59f654    1         1         1       2m3s
   replicaset.apps/prometheus-kube-state-metrics-786fbd7c69         1         1         1       2m3s

   NAME                                                                     READY   AGE
   statefulset.apps/alertmanager-prometheus-kube-prometheus-alertmanager    1/1     115s
   statefulset.apps/prometheus-prometheus-kube-prometheus-prometheus        1/1     114s

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;5.Now, deploying RayService.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: Update the monitoring part in the YAML configuration under the ray-head container env with the correct and uncomment values.&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
   &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;RAY_GRAFANA_IFRAME_HOST&lt;/span&gt;
     &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://127.0.0.1:3000&lt;/span&gt;
   &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;RAY_GRAFANA_HOST&lt;/span&gt;
     &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://prometheus-grafana.prometheus-system.svc:80&lt;/span&gt;
   &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;RAY_PROMETHEUS_HOST&lt;/span&gt;
     &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://prometheus-kube-prometheus-prometheus.prometheus-system.svc:9090&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To deploy, apply the YAML in the cluster.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; vllm-service-phi-3-mini-4k.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   rayservice.ray.io/vllm-service created
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It will take some time to load the images since rayproject/ray-ml:2.30.0 image is oversized (you could try building a small image file using that one, as mentioned &lt;a href="https://github.com/ray-project/ray/issues/46378" rel="noopener noreferrer"&gt;here&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;6.Behind the scenes of RayService.&lt;/p&gt;

&lt;p&gt;When you deploy a RayService CRD in your Kubernetes cluster, a coordinated series of events unfolds to set up your Ray cluster.&lt;/p&gt;

&lt;p&gt;The process starts when Kubernetes accepts your RayService definition. The KubeRay operator, which monitors for such resources, notices the new CRD and kicks into action. It reads your specifications and translates them into a RayCluster CRD detailing how the head and worker nodes should be configured.&lt;/p&gt;

&lt;p&gt;Next, the operator creates the necessary Kubernetes resources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Deployments&lt;/strong&gt; for the Ray head node.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ReplicaSets&lt;/strong&gt; and &lt;strong&gt;Pods&lt;/strong&gt; for the worker nodes, matching the number of &lt;strong&gt;replicas&lt;/strong&gt; you've specified. If your configuration includes scaling, the operator adjusts the number of worker replicas based on workload demands through manual settings or Ray's autoscale.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Services&lt;/strong&gt; to enable communication between nodes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ingress&lt;/strong&gt; resources if external access is needed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Kubernetes schedules these pods onto cluster nodes, considering resource requests and any scheduling rules we've set, like node affinities or tolerations. The Ray head node initializes the cluster as the pods come online and worker nodes connect.&lt;/p&gt;

&lt;p&gt;Ray actors—stateful work units—are scheduled across the worker nodes within the cluster. Ray's internal scheduler handles this, optimizing resource availability and workload distribution.&lt;/p&gt;

&lt;p&gt;Upon changes to RayService CRD spec by us, the operator will deploy new pods with the updated settings and gradually shift traffic to them, ensuring no downtime. Old pods are cleaned up once the new ones are running smoothly.&lt;/p&gt;

&lt;p&gt;7.Ray Dashboard.&lt;/p&gt;

&lt;p&gt;Once deployed, you can view the Ray Serve and all its related Actors in the Ray Dashboard, which will be available to forward Ray head service to local port 8265.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   kubectl port-forward svc/vllm-service-head-svc 8265:8265
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To directly see the metrics scraped by Prometheus and embedding Grafana visualizations in the Ray Dashboard, you would need to port-forward Grafana service.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   kubectl port-forward deployment/prometheus-grafana &lt;span class="nt"&gt;-n&lt;/span&gt; prometheus-system 3000:3000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Note: The admin password is listed in the values.yaml inside kuberay/install/prometheus/overrides.yaml.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Once you open the Grafana, you should load the Grafana dashboard from the KubeRay preset inside the KubeRay repo, kuberay/config/grafana directory. We are importing serve_deployment_grafana_dashboard.json in Grafana, which looks like below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp3md5laefj6ddu1k1arc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp3md5laefj6ddu1k1arc.png" alt="Ray Dashboard" width="800" height="788"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;8.Sending API requests to deployed model.&lt;/p&gt;

&lt;p&gt;Send a request to deployed LLM model inference server. To do that, you would need to port-forward the service on local port 8000&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   kubectl port-forward svc/vllm-service-serve-svc 8000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, send the curl request from the terminal or Postman application, whichever suits you the best.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   curl &lt;span class="nt"&gt;--location&lt;/span&gt; &lt;span class="nt"&gt;--request&lt;/span&gt; POST &lt;span class="s1"&gt;'http://127.0.0.1:8000/generate'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s1"&gt;'Content-Type: application/json'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;--data-raw&lt;/span&gt; &lt;span class="s1"&gt;'{
      "prompt": "&amp;lt;|user|&amp;gt;\n&amp;lt;|user|&amp;gt;\n What are Large Language Models?&amp;lt;|end|&amp;gt;\n&amp;lt;|assistant|&amp;gt;",
      "messages": [],
      "max_tokens": 500,
      "temperature": 0.1
   }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, the number of tokens to be generated is 500 and temperature is set to 0.1, you can change it if you like and play around with it to reach optimal value.&lt;/p&gt;

&lt;h4&gt;
  
  
  Sending multiple messages/chat format
&lt;/h4&gt;

&lt;p&gt;To send multiple messages similar to chat conversations with history as context, you could use the below curl.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;--location&lt;/span&gt; &lt;span class="nt"&gt;--request&lt;/span&gt; POST &lt;span class="s1"&gt;'http://127.0.0.1:8000/generate'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s1"&gt;'Content-Type: application/json'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--data-raw&lt;/span&gt; &lt;span class="s1"&gt;'{
    "prompt": "",
    "messages": [
        {
            "role": "user",
            "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"
        },
        {
            "role": "assistant",
            "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."
        },
        {
            "role": "user",
            "content": "What about solving an 2x + 3 = 7 equation?"
        }
    ],
    "max_tokens": 500,
    "temperature": 0.1
}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Monitoring model performance
&lt;/h3&gt;

&lt;h4&gt;
  
  
  With Ray Dashboard
&lt;/h4&gt;

&lt;p&gt;Ray Dashboard serves as a comprehensive monitoring tool for Ray clusters, providing live updates on service health, application deployments, resource consumption, and node-level diagnostics, which are crucial for managing distributed workloads.&lt;/p&gt;

&lt;p&gt;As you can see in the Serve tab, VLLMService is created with vLLM Inference as part of it, and there are logs in case you need to dive deep into something.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2s19y6ani64s0ryi6nho.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2s19y6ani64s0ryi6nho.png" alt="Serve Ray Dashboard" width="800" height="788"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The cluster tab gives you an overview of the cluster, from resource usage to what is causing which resources, once you click on the link to the cluster name. Upon toggling the button from table to card, you can see your memory and GPU resources.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvgbh6yl9ybgibc7jsmef.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvgbh6yl9ybgibc7jsmef.png" alt="Cluster Ray Dashboard" width="800" height="788"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Serve Replica, which we set to 1 for our application, is deployed, and we can see its logs in the Ray Dashboard under Actors. As stated earlier, the stateful unit of work is the Actor.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff90n2ob5ajkyfezmfhll.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff90n2ob5ajkyfezmfhll.png" alt="Logs in Ray Dashboard under Actors" width="800" height="788"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwodsmrqf9bm3vxbb2nbf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwodsmrqf9bm3vxbb2nbf.png" alt="Logs in Ray Dashboard" width="800" height="788"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  With a monitoring stack
&lt;/h4&gt;

&lt;p&gt;With Grafana and Prometheus in place, you can get more information, such as the QPS( Query Per Second ) of each service and replicas if you have more than one, and an overall view of the deployment we’ve deployed, i.e., VLLM Service. This monitoring setup reduces the burden and provides more than enough metrics for you when you start with Ray on Kubernetes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr0pulwusz0pgao2p9wh1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr0pulwusz0pgao2p9wh1.png" alt="Serve deployment dashboard" width="800" height="788"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is it! You’ve deployed your model and monitored Ray Cluster running on Kubernetes with KubeRay.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The model serving space is still evolving, but Ray Serve provides a good starting point for anyone looking to serve their model without having to restrict themselves to which inference or training framework/model format to work with. It allows users to choose and deploy any inference library, such as TensorRT, vLLM, etc.&lt;/p&gt;

&lt;p&gt;Apart from this, many big companies use the Ray ecosystem to scale and build AI infrastructure.  If you’re looking for experts who can help you scale or build your AI infrastructure, reach out to our &lt;a href="https://www.infracloud.io/build-ai-cloud/" rel="noopener noreferrer"&gt;AI &amp;amp; GPU Cloud experts&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you found this post valuable and informative, subscribe to our weekly newsletter for more posts like this. I’d love to hear your thoughts on this post, so do start a conversation on &lt;a href="https://www.linkedin.com/in/sudhanshu212/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Read More
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.infracloud.io/blogs/running-llama-3-with-triton-tensorrt-llm/" rel="noopener noreferrer"&gt;Running Llama 3 with Triton and TensorRT-LLM&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.infracloud.io/blogs/introduction-to-nvidia-network-operator/" rel="noopener noreferrer"&gt;Introduction to NVIDIA Network Operator&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.infracloud.io/blogs/retrieval-augmented-generation-using-data-with-llms/" rel="noopener noreferrer"&gt;Retrieval-Augmented Generation: Using your Data with LLMs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.infracloud.io/blogs/gpu-sharing-techniques-guide-vgpu-mig-time-slicing/" rel="noopener noreferrer"&gt;Guide to GPU Sharing Techniques: vGPU, MIG and Time Slicing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ray</category>
      <category>llm</category>
      <category>kubernetes</category>
      <category>aiops</category>
    </item>
    <item>
      <title>Primer on Distributed Parallel Processing with Ray using KubeRay</title>
      <dc:creator>Sudhanshu Prajapati</dc:creator>
      <pubDate>Fri, 08 Nov 2024 10:34:54 +0000</pubDate>
      <link>https://dev.to/infracloud/primer-on-distributed-parallel-processing-with-ray-using-kuberay-3i31</link>
      <guid>https://dev.to/infracloud/primer-on-distributed-parallel-processing-with-ray-using-kuberay-3i31</guid>
      <description>&lt;p&gt;In the early days of computing, applications handled tasks sequentially. As the scale grew with millions of users, this approach became impractical. Asynchronous processing allowed handling multiple tasks concurrently, but managing threads/processes on a single machine led to resource constraints and complexity.&lt;/p&gt;

&lt;p&gt;This is where distributed parallel processing comes in. By spreading the workload across multiple machines, each dedicated to a portion of the task, it offers a scalable and efficient solution. If you have a function to process a large batch of files, you can divide the workload across multiple machines to process files concurrently instead of handling them sequentially on one machine. Additionally, it improves performance by leveraging combined resources and provides scalability and fault tolerance. As the demands increase, you can add more machines to increase available resources. &lt;/p&gt;

&lt;p&gt;It is challenging to build and run distributed applications on scale, but there are several frameworks and tools to help you out. In this blog post, we'll examine one such open source distributed computing framework: Ray. We'll also look at KubeRay, a &lt;a href="https://www.infracloud.io/extending-kubernetes-comprehensive-guide-whitepaper/" rel="noopener noreferrer"&gt;Kubernetes operator&lt;/a&gt; that enables seamless Ray integration with Kubernetes clusters for distributed computing in cloud native environments. But first, let's understand where distributed parallelism helps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where does distributed parallel processing help?
&lt;/h2&gt;

&lt;p&gt;Any task that benefits from splitting its workload across multiple machines can utilize distributed parallel processing. This approach is particularly useful for scenarios such as web crawling, large-scale data analytics, machine learning model training, real-time stream processing, genomic data analysis, and video rendering. By distributing tasks across multiple nodes, distributed parallel processing significantly enhances performance, reduces processing time, and optimizes resource utilization, making it essential for applications that require high throughput and rapid data handling.&lt;/p&gt;

&lt;h3&gt;
  
  
  When distributed parallel processing is not needed?
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Small-scale applications&lt;/strong&gt;: For small datasets or applications with minimal processing requirements, the overhead of managing a distributed system may not be justified.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strong data dependencies&lt;/strong&gt;: If tasks are highly interdependent and cannot be easily parallelized, distributed processing may offer little benefit.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time constraints&lt;/strong&gt;: Some real-time applications (e.g., Finance and ticket booking websites) require extremely low latency, which might not be achievable with the added complexity of a distributed system.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Limited resources&lt;/strong&gt;: If the available infrastructure cannot support the overhead of a distributed system (e.g., insufficient network bandwidth, limited number of nodes), it may be better to optimize single-machine performance.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  How Ray helps with distributed parallel processing?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.ray.io/" rel="noopener noreferrer"&gt;Ray&lt;/a&gt; is a Distributed Parallel Processing framework that encapsulates all the benefits of distributed computing and solutions to challenges we discussed, such as fault tolerance, scalability, context management, communication, and so on. It is a &lt;a href="https://docs.ray.io/en/latest/ray-overview/index.html" rel="noopener noreferrer"&gt;pythonic framework&lt;/a&gt;, allowing the use of existing libraries and systems to work with it. With Ray's help, a programmer doesn’t need to handle the pieces of the parallel processing compute layer. Ray will take care of scheduling and autoscaling based on the specified resource requirements.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpjczk4lh3fsprnjwht31.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpjczk4lh3fsprnjwht31.png" alt="Ray provides a universal API of tasks, actors, and objects for building distributed applications." width="800" height="271"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(&lt;a href="https://www.google.com/url?q=https://docs.google.com/document/d/1tBw9A4j62ruI5omIJbMxly-la5w4q_TjyJgJL_jN2fI/preview&amp;amp;sa=D&amp;amp;source=docs&amp;amp;ust=1726563738037328&amp;amp;usg=AOvVaw3NuNFwmYwlIsXr4D40lQWE" rel="noopener noreferrer"&gt;Image Source&lt;/a&gt;: Ray provides a universal API of tasks, actors, and objects for building distributed applications.)&lt;/p&gt;

&lt;p&gt;Ray provides a set of libraries built on the core primitives, i.e., Tasks, Actors, Objects, Drivers, and Jobs. These provide a versatile API to help build distributed applications. Let’s take a look at the core primitives, a.k.a. Ray Core.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ray Core primitives
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tasks&lt;/strong&gt;: Ray tasks are arbitrary Python functions that are executed asynchronously on separate Python workers on a Ray cluster node. Users can specify their resource requirements in terms of CPUs, GPUs, and custom resources which are used by the cluster scheduler to distribute tasks for parallelized execution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Actors&lt;/strong&gt;: What tasks are to functions, actors are to classes. An actor is a stateful worker, and the methods of an actor are scheduled on that specific worker and can access and mutate the state of that worker. Like tasks, actors support CPU, GPU, and custom resource requirements.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Objects&lt;/strong&gt;: In Ray, tasks and actors create and compute objects. These remote objects can be stored anywhere in a Ray cluster. Object References are used to refer to them, and they are cached in Ray's distributed shared memory object store.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Drivers&lt;/strong&gt;: The program root, or the “main” program. This is the code that runs &lt;code&gt;ray.init()&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Jobs&lt;/strong&gt;: The collection of tasks, objects, and actors originating (recursively) from the same driver and their runtime environment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For information about primitives, you can go through the &lt;a href="https://docs.ray.io/en/latest/ray-core/walkthrough.html" rel="noopener noreferrer"&gt;Ray Core documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ray Core key methods
&lt;/h3&gt;

&lt;p&gt;Below are some of the key methods within Ray Core that are commonly used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;ray.init()&lt;/strong&gt; - Start Ray runtime and connect to the Ray cluster.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ray&lt;/span&gt;
&lt;span class="n"&gt;ray&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a class="mentioned-user" href="https://dev.to/ray"&gt;@ray&lt;/a&gt;.remote&lt;/strong&gt; - Decorator that specifies a Python function or class to be executed as a task (remote function) or actor (remote class) in a different process.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@ray.remote&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;remote_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;.remote&lt;/strong&gt; - Postfix to the remote functions and classes; remote operations are asynchronous.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;result_ref&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;remote_function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remote&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;ray.put()&lt;/strong&gt; - Put an object in the in-memory object store; returns an object reference used to pass the object to any remote function or method call.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;data_ref&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ray&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;ray.get()&lt;/strong&gt; - Get a remote object(s) from the object store by specifying the object reference(s).&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ray&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_ref&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;original_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ray&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_ref&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An example of using most of the basic key methods:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ray&lt;/span&gt;

&lt;span class="n"&gt;ray&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nd"&gt;@ray.remote&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_square&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;

&lt;span class="c1"&gt;# Using .remote to create a task
&lt;/span&gt;&lt;span class="n"&gt;future&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;calculate_square&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remote&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Get the result
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ray&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The square of 5 is: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How does Ray work?
&lt;/h3&gt;

&lt;p&gt;Ray Cluster is like a team of computers that share the work of running a program. It consists of a head node and multiple worker nodes. The head node manages the cluster state and scheduling, while worker nodes execute tasks and manage actors.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi7fl76d1azazoh9y1bit.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi7fl76d1azazoh9y1bit.png" alt="A Ray cluster" width="800" height="304"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.ray.io/en/latest/cluster/key-concepts.html" rel="noopener noreferrer"&gt;(A Ray cluster)&lt;/a&gt; &lt;/p&gt;

&lt;h4&gt;
  
  
  Ray Cluster components
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Global Control Store (GCS)&lt;/strong&gt;: The GCS manages the metadata and global state of the Ray cluster. It tracks tasks, actors, and resource availability, ensuring that all nodes have a consistent view of the system.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scheduler&lt;/strong&gt;: The scheduler distributes tasks and actors across available nodes. It ensures efficient resource utilization and load balancing by considering resource requirements and task dependencies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Head node&lt;/strong&gt;: The head node orchestrates the entire Ray cluster. It runs the GCS, handles task scheduling, and monitors the health of worker nodes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Worker nodes&lt;/strong&gt;: Worker nodes execute tasks and actors. They perform the actual computations and store objects in their local memory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Raylet&lt;/strong&gt;: It manages shared resources on each node and is shared among all concurrently running jobs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can check out the &lt;a href="https://docs.google.com/document/d/1tBw9A4j62ruI5omIJbMxly-la5w4q_TjyJgJL_jN2fI/preview" rel="noopener noreferrer"&gt;Ray v2 Architecture doc&lt;/a&gt; for more detailed information.&lt;/p&gt;

&lt;p&gt;Working with existing Python applications doesn’t require a lot of changes. The changes required would mainly be around the function or class that needs to be distributed naturally. You can add a decorator and convert it into tasks or actors. Let’s see an example of this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Converting a Python function into Ray Task&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# (Normal Python function)
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;square&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;

&lt;span class="c1"&gt;# Usage
&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;square&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Output: [0, 1, 4, 9]
&lt;/span&gt;

&lt;span class="c1"&gt;# (Ray Implementation)
# Define the square task.
&lt;/span&gt;&lt;span class="nd"&gt;@ray.remote&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;square&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;

&lt;span class="c1"&gt;# Launch four parallel square tasks.
&lt;/span&gt;&lt;span class="n"&gt;futures&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;square&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remote&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="c1"&gt;# Retrieve results.
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ray&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="c1"&gt;# -&amp;gt; [0, 1, 4, 9]
&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Converting a Python Class into Ray Actor&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# (Regular Python class)
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;incr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;

&lt;span class="c1"&gt;# Create an instance of the Counter class
&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Call the incr method on the instance
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;incr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Get the final state of the counter
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;  &lt;span class="c1"&gt;# Output: 10
&lt;/span&gt;
&lt;span class="c1"&gt;# (Ray implementation in actor)
# Define the Counter actor.
&lt;/span&gt;&lt;span class="nd"&gt;@ray.remote&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;incr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;

&lt;span class="c1"&gt;# Create a Counter actor.
&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remote&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Submit calls to the actor. These
# calls run asynchronously but in
# submission order on the remote actor
# process.
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;incr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remote&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Retrieve final actor state.
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ray&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remote&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
&lt;span class="c1"&gt;# -&amp;gt; 10
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Storing information in Ray Objects&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="c1"&gt;# (Regular Python function)
# Define a function that sums the values in a matrix
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;sum_matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;matrix&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;matrix&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Call the function with a literal argument value
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;sum_matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;))))&lt;/span&gt;  &lt;span class="c1"&gt;# Output: 10000.0
&lt;/span&gt;
&lt;span class="c1"&gt;# Create a large array
&lt;/span&gt;&lt;span class="n"&gt;matrix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# Call the function with the large array
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;sum_matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;matrix&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# Output: 1000000.0
&lt;/span&gt;

&lt;span class="c1"&gt;# (Ray implementation of function)
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="c1"&gt;# Define a task that sums the values in a matrix.
&lt;/span&gt;&lt;span class="nd"&gt;@ray.remote&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;sum_matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;matrix&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;matrix&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Call the task with a literal argument value.
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ray&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sum_matrix&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remote&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)))))&lt;/span&gt;
&lt;span class="c1"&gt;# -&amp;gt; 10000.0
&lt;/span&gt;
&lt;span class="c1"&gt;# Put a large array into the object store.
&lt;/span&gt;&lt;span class="n"&gt;matrix_ref&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ray&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

&lt;span class="c1"&gt;# Call the task with the object reference as argument.
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ray&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sum_matrix&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remote&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;matrix_ref&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="c1"&gt;# -&amp;gt; 1000000.0
&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To learn more about its concept, head over to &lt;a href="https://docs.ray.io/en/master/ray-core/key-concepts.html" rel="noopener noreferrer"&gt;Ray Core Key Concept&lt;/a&gt; docs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ray vs traditional approach of distributed parallel processing
&lt;/h2&gt;

&lt;p&gt;Below is a comparative analysis between the Traditional (without Ray) Approach vs Ray on Kubernetes to enable distributed parallel processing.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Traditional Approach&lt;/th&gt;
&lt;th&gt;Ray on Kubernetes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Deployment&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manual setup and configuration&lt;/td&gt;
&lt;td&gt;Automated with KubeRay Operator&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scaling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manual scaling&lt;/td&gt;
&lt;td&gt;Automatic scaling with RayAutoScaler and Kubernetes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fault Tolerance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Custom fault tolerance mechanisms&lt;/td&gt;
&lt;td&gt;Built-in fault tolerance with Kubernetes and Ray&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Resource Management&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manual resource allocation&lt;/td&gt;
&lt;td&gt;Automated resource allocation and management&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Load Balancing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Custom load balancing solutions&lt;/td&gt;
&lt;td&gt;Built-in load balancing with Kubernetes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dependency Management&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manual dependency installation&lt;/td&gt;
&lt;td&gt;Consistent environment with Docker containers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cluster Coordination&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Complex and manual&lt;/td&gt;
&lt;td&gt;Simplified with Kubernetes service discovery and coordination&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Development Overhead&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High, with custom solutions needed&lt;/td&gt;
&lt;td&gt;Reduced, with Ray and Kubernetes handling many aspects&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Flexibility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Limited adaptability to changing workloads&lt;/td&gt;
&lt;td&gt;High flexibility with dynamic scaling and resource allocation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Kubernetes provides an ideal platform for running distributed applications like Ray due to its robust orchestration capabilities. Below are the key pointers that set the value on running Ray on Kubernetes – &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Resource management&lt;/li&gt;
&lt;li&gt;Scalability&lt;/li&gt;
&lt;li&gt;Orchestration&lt;/li&gt;
&lt;li&gt;Integration with ecosystem&lt;/li&gt;
&lt;li&gt;Easy deployment and management&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;KubeRay Operator to make it possible to run Ray on Kubernetes.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is KubeRay?
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://github.com/ray-project/kuberay" rel="noopener noreferrer"&gt;KubeRay Operator&lt;/a&gt; simplifies managing Ray clusters on Kubernetes by automating tasks such as deployment, scaling, and maintenance. It uses Kubernetes Custom Resource Definitions (CRDs) to manage Ray-specific resources.&lt;/p&gt;

&lt;h3&gt;
  
  
  KubeRay CRDs
&lt;/h3&gt;

&lt;p&gt;It has three distinct CRDs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fltuo9493nucbbfkqsxz2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fltuo9493nucbbfkqsxz2.png" alt="KubeRay" width="800" height="321"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.ray.io/en/latest/cluster/kubernetes/index.html" rel="noopener noreferrer"&gt;(Image source)&lt;/a&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;RayCluster&lt;/strong&gt;: This CRD helps manage RayCluster's lifecycle and takes care of AutoScaling based on the configuration defined.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RayJob&lt;/strong&gt;: It is useful when there is a one-time job you want to run instead of keeping a standby RayCluster running all the time. It creates a RayCluster and submits the job when ready. Once the job is done, it deletes the RayCluster. This helps in automatically recycling the RayCluster.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RayService&lt;/strong&gt;: This also creates a RayCluster but deploys a RayServe application on it. This CRD makes it possible to do in-place updates to the application, providing zero-downtime upgrades and updates to ensure the high-availability of the application. &lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Use-cases of KubeRay
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Deploying an on-demand model using RayService
&lt;/h3&gt;

&lt;p&gt;RayService allows you to deploy models on-demand in a Kubernetes environment. This can be particularly useful for applications like image generation or text extraction, where models are deployed only when needed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://raw.githubusercontent.com/ray-project/kuberay/v1.0.0/ray-operator/config/samples/ray-service.stable-diffusion.yaml" rel="noopener noreferrer"&gt;Here is an example of Stable Diffuison&lt;/a&gt;. Once it is applied in Kubernetes, it will create RayCluster and also run a RayService, which will serve the model until you delete this resource. It allows users to take control of resources.&lt;/p&gt;

&lt;h3&gt;
  
  
  Training a model on a GPU cluster using RayJob
&lt;/h3&gt;

&lt;p&gt;RayService serves different requirements to the user, where it keeps the model or application deployed until it is deleted manually. In contrast, RayJob allows one-time jobs for use cases like training a model, preprocessing data, or inference for a fixed number of given prompts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Run inference server on Kubernetes using RayService or RayJob
&lt;/h3&gt;

&lt;p&gt;Generally, we run our application in Deployments, which maintains the rolling updates without downtime. Similarly, in KubeRay, this can be achieved using RayService, which deploys the model or application and handles the rolling updates.&lt;/p&gt;

&lt;p&gt;However, there could be cases where you just want to do batch inference instead of running the inference servers or applications for a long time. This is where you can leverage RayJob, which is similar to the Kubernetes Job resource.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.ray.io/en/latest/data/examples/huggingface_vit_batch_prediction.html" rel="noopener noreferrer"&gt;Image Classification Batch Inference with Huggingface Vision Transformer&lt;/a&gt; is an example of RayJob, which does Batch Inferencing.&lt;/p&gt;

&lt;p&gt;These are the use cases of KubeRay, enabling you to do more with the Kubernetes cluster. With the help of KubeRay, you can run mixed workloads on the same Kubernetes cluster and offload GPU-based workload scheduling to Ray.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Distributed parallel processing offers a scalable solution for handling large-scale, resource-intensive tasks. Ray simplifies the complexities of building distributed applications, while KubeRay integrates Ray with Kubernetes for seamless deployment and scaling. This combination enhances performance, scalability, and fault tolerance, making it ideal for web crawling, data analytics, and machine learning tasks. By leveraging Ray and KubeRay, you can efficiently manage distributed computing, meeting the demands of today's data-driven world with ease.&lt;/p&gt;

&lt;p&gt;Not only that, but as our compute resource types are changing from CPU to GPU-based, it becomes important to have efficient and scalable cloud infrastructure for all sorts of applications, whether it be AI or large data processing. For that, you can bring in &lt;a href="https://www.infracloud.io/build-ai-cloud/" rel="noopener noreferrer"&gt;AI and GPU Cloud experts&lt;/a&gt; onboard to help you out. &lt;/p&gt;

&lt;p&gt;We hope you found this post informative and engaging. For more posts like this one, subscribe to our weekly newsletter. We’d love to hear your thoughts on this post, so do start a conversation on &lt;a href="https://www.linkedin.com/in/sudhanshu212/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aiops</category>
      <category>mlops</category>
      <category>llm</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>10 Feature Flag Tools to Confidently Release New Features</title>
      <dc:creator>Sudhanshu Prajapati</dc:creator>
      <pubDate>Fri, 12 Jul 2024 05:05:18 +0000</pubDate>
      <link>https://dev.to/infracloud/10-feature-flag-tools-to-confidently-release-new-features-3e76</link>
      <guid>https://dev.to/infracloud/10-feature-flag-tools-to-confidently-release-new-features-3e76</guid>
      <description>&lt;p&gt;Feature flags offer an excellent way to quickly turn off and on product changes by enabling you to remove and add the code in the software quickly. Marketers or product managers can choose a time and moment to make a feature or function live to win that aha moment.&lt;/p&gt;

&lt;p&gt;The feature flags are helpful to various departments, including marketing, product, testing, CROs, and development. The number of feature flags can rise quickly as the team realizes their helpfulness and begins to utilize them. To avoid the mismanagement it may create, you need feature flag platforms. A comprehensive space where you can place all your feature flags and manage, modify, and delete them.&lt;/p&gt;

&lt;p&gt;Finding a tool that fits the exact needs and requirements of developers, marketers, and product managers can be challenging. But don’t worry; we have done the heavy lifting for you. In this article, we have curated a list of the 10 feature flag tools and their best features. We've also covered the common functionalities you should look for when selecting tools for your team.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are feature flag tools?
&lt;/h2&gt;

&lt;p&gt;A feature flag tool, also known as a feature management or feature toggle tool, is a software or platform designed to facilitate the implementation, management, and control of feature flags in software applications. These tools provide a centralized interface or API that allows developers and teams to easily create, deploy, and monitor feature flags without directly modifying the underlying codebase.&lt;/p&gt;

&lt;p&gt;To understand feature flags tools, let’s summarize what feature flags are first.  &lt;/p&gt;

&lt;p&gt;Feature flags, also known as feature toggles or feature switches, are software development techniques used to enable or disable certain features or functionalities in an application or system. They allow developers to control the release and availability of specific features to different user segments or environments without the need for code deployments or separate branches.&lt;/p&gt;

&lt;h2&gt;
  
  
  Do feature flag platforms help?
&lt;/h2&gt;

&lt;p&gt;Yes. Feature flag platform comes with a range of features, including centralized flag management, an easy-to-use interface, user segmentation, traffic allocation, and integration with other tools to simplify the process of using feature flags in software development. &lt;/p&gt;

&lt;p&gt;Feature flag platform enables you to:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Gradually roll out new features&lt;/strong&gt;: Release features to a small percentage of users and gradually increase rollout for feedback and risk mitigation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Perform A/B testing&lt;/strong&gt;: Run experiments exposing different feature variations to user segments to determine optimal performance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enable feature toggling&lt;/strong&gt;: Dynamically enable or disable features without code changes for flexible control over feature availability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rollback problematic features&lt;/strong&gt;: Quickly deactivate features causing issues and revert to a stable state to maintain system stability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trunk-based development&lt;/strong&gt;: Merge the code to the main branch without releasing it to production.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Personalize user experiences&lt;/strong&gt;: Customize user experiences based on attributes, roles, or preferences to enhance satisfaction and engagement.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a non-tech person, doing it all using CLI and code could be confusing &amp;amp; challenging. Plus, as you continue to create and use, you will have many feature flags, which could lead to mismanagement. Having a feature flag tool helps you there.&lt;/p&gt;

&lt;h2&gt;
  
  
  Popular feature flag tools
&lt;/h2&gt;

&lt;p&gt;InfraCloud DevOps, platform engineering, and software development teams extensively use feature flags. So, we asked them which tools they preferred and why.&lt;/p&gt;

&lt;p&gt;We uncovered many feature flag tools, both open source and commercial. The ‘best’ depends on the project requirements and engineers' preferences. However, there are still basic features that a feature flag software must have. Here, we have shortlisted feature flag software covering fundamental features and advanced capabilities for specific use cases.&lt;/p&gt;

&lt;p&gt;For now, let’s see the best feature flag tools:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;FeatureHub&lt;/li&gt;
&lt;li&gt;Unleash&lt;/li&gt;
&lt;li&gt;Flipt&lt;/li&gt;
&lt;li&gt;Growth Book&lt;/li&gt;
&lt;li&gt;Flagsmith&lt;/li&gt;
&lt;li&gt;Flagd&lt;/li&gt;
&lt;li&gt;LaunchDarkly&lt;/li&gt;
&lt;li&gt;Split&lt;/li&gt;
&lt;li&gt;ConfigCat&lt;/li&gt;
&lt;li&gt;CloudBees
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let's discuss each of them in detail.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. FeatureHub
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwq8uiqsgd5v5g7j2nd6b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwq8uiqsgd5v5g7j2nd6b.png" alt="FeatureHub" width="800" height="529"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(Image src: &lt;a href="https://www.featurehub.io/" rel="noopener noreferrer"&gt; FeatureHub&lt;/a&gt;) &lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.featurehub.io/" rel="noopener noreferrer"&gt;FeatureHub&lt;/a&gt; is a cloud-native feature flag platform that allows you to run experiments across services in your environment with a user-friendly interface — FeatureHub Admin Console. It comes with a variety of SDKs so you can connect FeatureHub with your software. Whether you are a tester, developer, or marketer, you can control all the feature flags and their visibility in any environment.&lt;/p&gt;

&lt;p&gt;If you are looking for a tool that focuses more on feature and configuration management, FeatureHub may be the better choice. Its microservices architecture allows for greater scalability and extensibility, and it provides advanced features such as versioning, templates, and the ability to roll back changes.&lt;/p&gt;

&lt;p&gt;Features of FeatureHub:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open source version available&lt;/li&gt;
&lt;li&gt;SaaS in beta version&lt;/li&gt;
&lt;li&gt;Google Analytics/RBAC/AB Testing&lt;/li&gt;
&lt;li&gt;Supported SDK included Python, Ruby, and Go&lt;/li&gt;
&lt;li&gt;OpenFeature is in process&lt;/li&gt;
&lt;li&gt;SSO support&lt;/li&gt;
&lt;li&gt;Community support &amp;amp; documentation&lt;/li&gt;
&lt;li&gt;Dedicated support to SaaS users&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Unleash
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkn7ebdux475xcq0vq073.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkn7ebdux475xcq0vq073.png" alt="Unleash" width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(Image src: &lt;a href="https://github.com/Unleash/unleash" rel="noopener noreferrer"&gt; Unleash&lt;/a&gt;) &lt;/p&gt;

&lt;p&gt;With 10M+ Docker downloads, &lt;a href="https://www.getunleash.io/" rel="noopener noreferrer"&gt;Unleash&lt;/a&gt; is a popular and widely used open source feature flag platform. As it supports Docker images, you can scale it horizontally by deploying it on Kubernetes. The platform's intuitive interface and robust API make it accessible and flexible for developers, testers, and product managers alike.&lt;/p&gt;

&lt;p&gt;However, the open source version lacks several critical functions, such as SSO, RBAC, network traffic overview, and notifications. However, you can integrate these features using other open source solutions.&lt;/p&gt;

&lt;p&gt;If you are looking for a tool that focuses more on feature flagging and targeting, then Unleash might be the better choice for you. Unleash provides more advanced capabilities for user targeting, including the ability to target users based on custom attributes and the ability to use percentage rollouts. Additionally, it has a wider range of integrations with popular development tools, including Datadog, Quarkus, Jira, and Vue.&lt;/p&gt;

&lt;p&gt;Features of Unleash:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open source version available&lt;/li&gt;
&lt;li&gt;AB Testing/RBAC/Targeted Release/Canary release&lt;/li&gt;
&lt;li&gt;SDK support for Go, Java, Node.js, PHP, Python etc&lt;/li&gt;
&lt;li&gt;OpenFeature supported&lt;/li&gt;
&lt;li&gt;Community support and documentation&lt;/li&gt;
&lt;li&gt;Premium support for paid users&lt;/li&gt;
&lt;li&gt;Observability with Prometheus&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Flipt
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzz82ivfyq37x7nqfd49n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzz82ivfyq37x7nqfd49n.png" alt="Flipt" width="800" height="297"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(Image src: &lt;a href="https://www.flipt.io/" rel="noopener noreferrer"&gt; Flipt&lt;/a&gt;) &lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.flipt.io/" rel="noopener noreferrer"&gt;Flipt&lt;/a&gt; is a 100% open source, self-hosted feature flag application that helps product teams to manage all their features smoothly from a dashboard. You can also integrate Flipt with your GitOps workflow and manage feature flags as code. With Flipt, you get all the necessary features, including flag management and segment-wise rollout. The platform is built in the Go language and is optimized for performance. The project is under active development with a public roadmap.&lt;/p&gt;

&lt;p&gt;Features of Flipt:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only open source version&lt;/li&gt;
&lt;li&gt;No SaaS&lt;/li&gt;
&lt;li&gt;Support for REST &amp;amp; GRPC API&lt;/li&gt;
&lt;li&gt;Native client SDKs available in Go, Ruby, Java, Python etc.&lt;/li&gt;
&lt;li&gt;OpenFeature supported&lt;/li&gt;
&lt;li&gt;SSO with OIDC &amp;amp; Static Token&lt;/li&gt;
&lt;li&gt;Observability out of the box with Prometheus &amp;amp; OpenTelemetry&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. GrowthBook
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6mg33xdi5gwcojlown1m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6mg33xdi5gwcojlown1m.png" alt="GrowthBook" width="800" height="536"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(Image src: &lt;a href="https://www.growthbook.io/" rel="noopener noreferrer"&gt; GrowthBook&lt;/a&gt;) &lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.growthbook.io/" rel="noopener noreferrer"&gt;GrowthBook&lt;/a&gt; is primarily a product testing platform for checking users' responses to features. It is relatively new, and the SaaS version is much more affordable than other SaaS-based feature flag platforms. SDKs from GrowthBook are available in all major languages and are designed not to interfere with feature flag rendering.&lt;/p&gt;

&lt;p&gt;You can easily create experiments using GrowthBook's drag-and-drop interface. Integrations with popular analytics tools, such as Google Analytics and Mixpanel, make tracking experiments easier for better results. If you run many A/B experiments and do not want to share your data with 3rd party apps, GrowthBook could be an amazing option as it pulls the data directly from the source.&lt;/p&gt;

&lt;p&gt;Features of GrowthBook:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open source version available&lt;/li&gt;
&lt;li&gt;SaaS version available&lt;/li&gt;
&lt;li&gt;A/B Testing/unlimited projects&lt;/li&gt;
&lt;li&gt;SDK support for React, PHP, Ruby, Python, Go, etc&lt;/li&gt;
&lt;li&gt;Observability via Audit Log&lt;/li&gt;
&lt;li&gt;Community support and documentation&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Flagsmith
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpukuslmabm9i1cgchhdi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpukuslmabm9i1cgchhdi.png" alt="Flagsmith" width="800" height="485"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(Image src: &lt;a href="https://github.com/Flagsmith/flagsmith" rel="noopener noreferrer"&gt; Flagsmith&lt;/a&gt;) &lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.flagsmith.com/" rel="noopener noreferrer"&gt;Flagsmith&lt;/a&gt; is another open source solution for creating and managing feature flags easily across web, mobile, and server-side applications. You can wrap a section of code with a flag and then use the Flagsmith dashboard to toggle that feature on or off for different environments, users, or user segments.&lt;/p&gt;

&lt;p&gt;Flagsmith offers segments, A/B testing, and analytics engine integrations that are out of the box. However, if you want real-time updates on the front end, you have to build your own real-time infrastructure. One of the best parts of the Flaghsmith is the Remote config, which lets you change the application in real-time, saving you from the approval process for the new features.&lt;/p&gt;

&lt;p&gt;Features of Flagsmith:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open source version available&lt;/li&gt;
&lt;li&gt;SaaS product available&lt;/li&gt;
&lt;li&gt;A/B Testing/RBAC/Integrations with tool&lt;/li&gt;
&lt;li&gt;SDK support for RUBY, .NET, PHP, GO, RUST, etc&lt;/li&gt;
&lt;li&gt;OpenFeature support&lt;/li&gt;
&lt;li&gt;HelpDesk for community support&lt;/li&gt;
&lt;li&gt;Docker/Kubernetes/OpenShift/On-Premise (Paid)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. Flagd
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2phfefuofqthmn256lyk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2phfefuofqthmn256lyk.png" alt="Flagd" width="800" height="301"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(Image src: &lt;a href="https://github.com/open-feature/flagd" rel="noopener noreferrer"&gt; Flagd&lt;/a&gt;) &lt;/p&gt;

&lt;p&gt;&lt;a href="https://flagd.dev/" rel="noopener noreferrer"&gt;Flagd&lt;/a&gt; is a unique feature flag platform. It does not have a UI, management console, or persistence layer and is completely configurable via a POSIX-style CLI. Due to this, Flagd is extremely flexible and can be fit into various infrastructures to run on various architectures. It supports multiple &lt;a href="https://flagd.dev/concepts/syncs/" rel="noopener noreferrer"&gt;feature flag sources called syncs&lt;/a&gt; like file, http, gRPC, Kubernetes custom resource, and has ability to merge those flags.&lt;/p&gt;

&lt;p&gt;Features of Flagd:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only open source version is available&lt;/li&gt;
&lt;li&gt;Progressive roll outs&lt;/li&gt;
&lt;li&gt;Works with OpenFeature SDK&lt;/li&gt;
&lt;li&gt;Technical documentation&lt;/li&gt;
&lt;li&gt;Lightweight and flexible&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  7. LaunchDarkly
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F00nrz6lqdxcjh9upec9a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F00nrz6lqdxcjh9upec9a.png" alt="LaunchDarkly" width="800" height="630"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(Image src: &lt;a href="https://launchdarkly.com/" rel="noopener noreferrer"&gt; LaunchDarkly&lt;/a&gt;) &lt;/p&gt;

&lt;p&gt;&lt;a href="https://launchdarkly.com/" rel="noopener noreferrer"&gt;LaunchDarkly&lt;/a&gt; is a good entry point for premium feature management tools as it is not expensive comparatively but offers many useful features. It enables you to easily create, manage, and organize your feature flags at scale. You can also schedule approved feature flags to build a custom workflow.&lt;/p&gt;

&lt;p&gt;One of the features of LaunchDarkly is Prerequisites, where you can create feature flag hierarchies, where the triggering of one flag unlocks other flags that control the user experience. This way, you can execute multiple feature flags with one toggle. With multiple integration options available, including API, SDK support, and Git tools, you can automate various tasks in LaunchDarkly.&lt;/p&gt;

&lt;p&gt;If you are looking for paid software with quality support and a comprehensive set of features, LaunchDarkly could be your option.&lt;/p&gt;

&lt;p&gt;Features of LaunchDarkly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No open source version is available&lt;/li&gt;
&lt;li&gt;SaaS product only&lt;/li&gt;
&lt;li&gt;A/B Testing/Multiple variants testing&lt;/li&gt;
&lt;li&gt;SDK support for Go, Gatsby, Flutter, Java, PHP etc&lt;/li&gt;
&lt;li&gt;OpenFeature supported&lt;/li&gt;
&lt;li&gt;Academy, blogs, tutorials, guides &amp;amp; documentation&lt;/li&gt;
&lt;li&gt;Live chat support&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  8. Split
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flzd0kbko9m3cbkodxzx3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flzd0kbko9m3cbkodxzx3.png" alt="Split" width="800" height="403"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(Image src: &lt;a href="https://www.youtube.com/watch?v=sRRVsptOHqQ" rel="noopener noreferrer"&gt; Split&lt;/a&gt;) &lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.split.io/" rel="noopener noreferrer"&gt;Split&lt;/a&gt; brings an impressive set of features and a cost-effective solution for feature flag management. It connects the feature with engineering and customer data &amp;amp; sends alerts when a new feature misbehaves. With Split, you can easily define percentage rollouts to measure the impact of features.&lt;/p&gt;

&lt;p&gt;There is no community support, but the documentation is detailed and organized. Once you move ahead of the slight learning curve, you can easily organize all your feature flags at scale with Split.&lt;/p&gt;

&lt;p&gt;Features of Split:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No open source version&lt;/li&gt;
&lt;li&gt;SaaS-based platform&lt;/li&gt;
&lt;li&gt;A/B Testing/Multi-variant testing/Dimension analysis&lt;/li&gt;
&lt;li&gt;SDK support for Go, Python, Java, PHP etc&lt;/li&gt;
&lt;li&gt;OpenFeature supported&lt;/li&gt;
&lt;li&gt;Blogs, guides &amp;amp; documentation&lt;/li&gt;
&lt;li&gt;No on-prem solution&lt;/li&gt;
&lt;li&gt;Free plan available&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  9. ConfigCat
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5av0umhechkspsu3d5de.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5av0umhechkspsu3d5de.png" alt="ConfigCat" width="800" height="413"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(Image src: &lt;a href="https://www.youtube.com/watch?v=AjHySwLf0DY" rel="noopener noreferrer"&gt; ConfigCat&lt;/a&gt;) &lt;/p&gt;

&lt;p&gt;&lt;a href="https://configcat.com/" rel="noopener noreferrer"&gt;ConfigCat&lt;/a&gt; enables product teams to run experiments (without involving developer resources) to measure user interactions and release new features to the products. You can turn the features ON/OFF via a user-friendly dashboard even after your code is deployed.&lt;/p&gt;

&lt;p&gt;ConfigCat can be integrated with many tools and services, including Datadog, Slack,  Zapier, and Trello. It provides open source SDKs to support easy integration with your mobile, desktop application, website, or any backend system. One fantastic feature of this software is Zombie Flags – which identifies flags that are not functional or have been used for a long time and should be removed.&lt;/p&gt;

&lt;p&gt;Features of ConfigCat:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No open source version is available&lt;/li&gt;
&lt;li&gt;SaaS product&lt;/li&gt;
&lt;li&gt;% rollouts, A/B testing/variations.&lt;/li&gt;
&lt;li&gt;SDK support for Go, Java, Python, PHP, Ruby etc&lt;/li&gt;
&lt;li&gt;OpenFeature supported&lt;/li&gt;
&lt;li&gt;Blogs, documentation &amp;amp; Slack community support&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  10. CloudBees
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8jyy0d8m6jzng4y4wh5v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8jyy0d8m6jzng4y4wh5v.png" alt="CloudBees" width="800" height="340"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(Image src: &lt;a href="https://www.cloudbees.com/capabilities/feature-management" rel="noopener noreferrer"&gt; CloudBees&lt;/a&gt;) &lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.cloudbees.com/" rel="noopener noreferrer"&gt;CloudBees&lt;/a&gt; is not a dedicated feature flag management platform, but it allows you to manage feature flag permissions and automate cleanup easily. While having a dashboard helps, CloudBees also offers bidirectional configuration as code with GitHub to edit flags in your preferred environments.&lt;/p&gt;

&lt;p&gt;The dashboard's sleek and intuitive design makes it easier for developers and DevOps teams to use and leverage its functionalities. However, the software has so many features that it could be a slight challenge to learn all of them.&lt;/p&gt;

&lt;p&gt;Features of CloudBees:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No open source version is available&lt;/li&gt;
&lt;li&gt;SaaS product&lt;/li&gt;
&lt;li&gt;A/B Testing/Multiple variant testing&lt;/li&gt;
&lt;li&gt;SDK support for Java, Python, C++, Ruby etc&lt;/li&gt;
&lt;li&gt;OpenFeature supported&lt;/li&gt;
&lt;li&gt;Blogs, video tutorials, &amp;amp; documentation&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Quick comparison of the feature flag tools
&lt;/h2&gt;

&lt;p&gt;Open the sheet to have a &lt;a href="https://docs.google.com/spreadsheets/d/17LCvDlitKlR5d16T-bAkHXhAnEZPKPBBzmH5FruZMwk/" rel="noopener noreferrer"&gt;comparison of feature flag tools&lt;/a&gt; in a glance. &lt;/p&gt;

&lt;h2&gt;
  
  
  What should you look for in a feature flag tool?
&lt;/h2&gt;

&lt;p&gt;There are so many feature flag tools, but these are the features you must look for when picking a platform.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Community support
&lt;/h3&gt;

&lt;p&gt;Proper support is crucial to overcoming the initial onboarding challenges, whether for an open source or proprietary product. Some OSSs have an extensive community, documentation, blogs, and user-generated content to help and educate the next generation of users. The OSS product's creators, maintainers, and experts often offer commercial support. For example, at InfraCloud, we offer &lt;a href="https://www.infracloud.io/linkerd-consulting-support/" rel="noopener noreferrer"&gt;Linkerd support&lt;/a&gt;, &lt;a href="https://www.infracloud.io/prometheus-commercial-support/" rel="noopener noreferrer"&gt;Prometheus support&lt;/a&gt;, and &lt;a href="https://www.infracloud.io/istio-support/" rel="noopener noreferrer"&gt;Istio support&lt;/a&gt; because our engineers are proficient in these technologies.&lt;/p&gt;

&lt;p&gt;For closed source products, you can get video tutorials, blogs, documentation, and live chat, and most importantly, you can raise a ticket and solve your problem quickly. Not having a proper support channel can leave you in the middle during an emergency. So, analyze your requirements to see what kind of support your team needs, whether they can do it with the help of documentation or need hand-holding.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Integration
&lt;/h3&gt;

&lt;p&gt;It is critical for the successful feature flag process that the programming languages used to develop the products are well supported by the feature flag platform. If the language is not supported, enough resources should be available to connect your product and feature flag platform. &lt;/p&gt;

&lt;p&gt;Going with platforms that support OpenFeature could be a good solution. As OpenFeature provides a vendor-agnostic, community-driven API for feature flagging that works with your favorite feature flag management tool. You would not have to change the application code much in case you plan to change it later.&lt;/p&gt;

&lt;p&gt;In the list, I mentioned the feature flag platforms that support the most common and popular development languages and are OpenFeature friendly. When selecting a feature flag platform, don’t forget to analyze your tech stack to find whether the feature flag is compatible. Otherwise, a major chunk of time might go into developing the integrations between the technology used and the feature flag platform.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. 3rd Party Apps
&lt;/h3&gt;

&lt;p&gt;What if you could view and monitor feature flags and approval requests from your team's Slack workspace or use Terraform to configure and control the feature flags?&lt;/p&gt;

&lt;p&gt;All this and more is possible if the feature flag offers integrations. You can bring integrations by wrangling scripts and making an automation process that works on triggers. But here, we picked the software with native integration abilities to streamline &amp;amp; automate the feature flag operations further.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Easy-to-use UI
&lt;/h3&gt;

&lt;p&gt;Feature flags are not always used by developers. Often, product marketers like to have control over the lever that launches the features to the public. In case of any issue, marketers and product managers can quickly kill the feature that makes the product unstable from the platform without waiting for the developer.&lt;/p&gt;

&lt;p&gt;So, having an easy-to-use user interface is a key characteristic when selecting a feature flag tool. Some open source feature flag platforms have a rudimentary design covering basics, and some are fully-fledged platforms with incredible UX and tutorials at every corner.&lt;/p&gt;

&lt;p&gt;In the list, we covered the software that has a usable UI.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Testing &amp;amp; reporting
&lt;/h3&gt;

&lt;p&gt;New features can be tested using the feature flags. Sophisticated feature flags tools come with various testing methods, including A/B/n and blue-green deployment strategy. Functions like setting up variable and controlled factors, allocating traffic, and insights from the result are extremely helpful in delivering a product feature confidently.&lt;/p&gt;

&lt;p&gt;With feature flag tools, you can segment the users and roll the features accordingly to test the initial responses. The software also comes with dashboards to see the results of the experiments. You can view all the requests and how users spend time using the software with newly released features. &lt;/p&gt;

&lt;p&gt;These tools include testing and reporting features, making it easy to run experiments and make data-backed decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQs related to feature flag tools
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What are the different types of feature flags?
&lt;/h3&gt;

&lt;p&gt;There are several types of feature flags commonly used in software development:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Boolean flags&lt;/strong&gt;: These flags are the simplest feature flags based on a true/false value. They enable or disable a feature globally across all users or environments.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Percentage rollouts&lt;/strong&gt;: Also known as "gradual rollouts" or "canary releases," these flags allow features to be gradually released to a percentage of users. For example, a feature can be enabled for 10% of users initially, then gradually increased to 25%, 50%, etc.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;User segmentation flags&lt;/strong&gt;: These flags enable features for specific user segments based on predefined criteria such as user attributes, roles, or subscription levels. They allow targeted feature releases to specific groups of users.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Feature toggle flags&lt;/strong&gt;: Feature toggle flags provide more granular control over the behavior of a feature. They allow different variations or configurations of a feature to be activated or deactivated dynamically.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Who uses feature flags?
&lt;/h3&gt;

&lt;p&gt;Software development teams, including developers, product managers, and DevOps engineers, widely use feature flags. They are particularly beneficial in agile and continuous delivery environments, where iterative development, experimentation, and frequent releases are essential.&lt;/p&gt;

&lt;h3&gt;
  
  
  What are feature flags' limitations?
&lt;/h3&gt;

&lt;p&gt;While feature flags offer numerous advantages, they also have some limitations to consider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Increased complexity&lt;/strong&gt;: Introducing feature flags adds complexity to the codebase and requires careful management to avoid technical debt and maintainability issues.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Performance overhead&lt;/strong&gt;: Feature flags introduce conditional checks that can impact performance, especially when numerous flags are evaluated at runtime.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Flag proliferation&lt;/strong&gt;: Over time, the number of feature flags may grow, leading to potential confusion, maintenance challenges, and increased technical debt.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Testing effort&lt;/strong&gt;: Feature flags require additional testing efforts to ensure the functionality of different flag combinations and variations.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What is the difference between a feature gate and a feature flag?
&lt;/h3&gt;

&lt;p&gt;The terms "feature gate" and "feature flag" are often used interchangeably, but they can have slightly different connotations. A feature gate typically refers to a more granular control mechanism that checks whether a specific user has access to a particular feature, usually based on permissions or user roles. On the other hand, a feature flag is a broader concept encompassing various flags used to control feature availability, behavior, or rollout.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is a feature flag rollback?
&lt;/h3&gt;

&lt;p&gt;Feature flag rollback refers to deactivating a feature flag and reverting the system's behavior to a previous state. It is typically used when a feature causes unexpected issues, performance problems, or undesirable outcomes. The system can revert to a stable state by rolling back a feature flag until the underlying issues are addressed.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is feature flag hygiene?
&lt;/h3&gt;

&lt;p&gt;Feature flag hygiene refers to best practices and guidelines for managing feature flags effectively. It involves maintaining a clean and manageable set of flags by periodically reviewing and removing obsolete or unused flags.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final words
&lt;/h2&gt;

&lt;p&gt;Finding the best feature flag platform isn’t easy, especially when you have many great options. While all these tools are great, you must factor in your requirements to find the best fit.&lt;/p&gt;

&lt;p&gt;We hope this list helps you find the best platform to manage feature flags. This article is developed with the contribution of &lt;a href="https://www.linkedin.com/in/faizanfahim/" rel="noopener noreferrer"&gt;Faizan&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/sagar-parmar-834403a6/" rel="noopener noreferrer"&gt;Sagar&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/bhavin192/" rel="noopener noreferrer"&gt;Bhavin&lt;/a&gt;, and &lt;a href="https://www.linkedin.com/in/sudhanshu212/" rel="noopener noreferrer"&gt;Sudhanshu&lt;/a&gt;. You can reach out to any of them if you need answers to any of your doubts.&lt;/p&gt;

&lt;p&gt;Looking for help with building your DevOps strategy or want to outsource DevOps to the experts? Learn why so many startups &amp;amp; enterprises consider us as one of the &lt;a href="https://www.infracloud.io/devops-consulting-services/" rel="noopener noreferrer"&gt;best DevOps consulting &amp;amp; services companies&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>featureflag</category>
      <category>cicd</category>
      <category>featureflagsplatform</category>
      <category>devops</category>
    </item>
    <item>
      <title>GitOps using Flux and Flagger</title>
      <dc:creator>Sudhanshu Prajapati</dc:creator>
      <pubDate>Fri, 25 Nov 2022 08:21:27 +0000</pubDate>
      <link>https://dev.to/infracloud/gitops-using-flux-and-flagger-15ci</link>
      <guid>https://dev.to/infracloud/gitops-using-flux-and-flagger-15ci</guid>
      <description>&lt;p&gt;GitOps as a practice has been in use since 2017 when Alexis Richardson coined the term. It transformed DevOps and automation. If you look at its core principles, it extends DevOps by treating Infrastructure as Code (IaC). Your deployment configuration is stored in a version control system (a.ka. Git), providing a single source of truth for both dev and ops.&lt;/p&gt;

&lt;p&gt;As the framework's adoption increased, &lt;a href="https://thenewstack.io/5-cloud-native-trends-to-watch-out-for-in-2022/" rel="noopener noreferrer"&gt;GitOps became the standard for continuous deployment&lt;/a&gt; in the cloud native space. Many agile teams &lt;a href="https://www.infracloud.io/devops-consulting-services/" rel="noopener noreferrer"&gt;adopt GitOps&lt;/a&gt; because of familiarity with git-based workflow for release management of cloud native workloads.&lt;/p&gt;

&lt;p&gt;GitOps principles differ from the traditional CI &amp;amp; CD pipeline approach. In the last few years, the GitOps working group under CNCF formalized all the ideas developed around GitOps into a cohesive set of principles that have become &lt;a href="https://opengitops.dev/" rel="noopener noreferrer"&gt;the GitOps Principles&lt;/a&gt;.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Declarative&lt;/li&gt;
&lt;li&gt;Versioned and Immutable&lt;/li&gt;
&lt;li&gt;Pulled automatically&lt;/li&gt;
&lt;li&gt;Continuously Reconciled&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The uses of GitOps helped organizations in the following aspects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Swifter deployment and more often&lt;/li&gt;
&lt;li&gt;Fast and easy disaster recovery&lt;/li&gt;
&lt;li&gt;Effortless credential management&lt;/li&gt;
&lt;li&gt;Improved developer experience&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;GitOps created a standard practice that allowed engineers to focus on developing solutions rather than figuring out how to deploy them.&lt;/p&gt;

&lt;p&gt;However, as companies grow, they increase the rate of new features, and the risk of downtime/failures in production also increases. They face problems like the control of blast radius and minimal risk from recent releases. &lt;/p&gt;

&lt;p&gt;So, is there any way blast radius can be minimized while testing out releases to a subset of users? – Yes, there is a way through Progressive Delivery.&lt;/p&gt;

&lt;h2&gt;
  
  
  Progressive Delivery
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What is Progressive Delivery and who coined the term?&lt;/strong&gt;&lt;br&gt;
The term &lt;a href="https://www.infoq.com/presentations/progressive-delivery/" rel="noopener noreferrer"&gt;Progressive Delivery&lt;/a&gt; was coined by &lt;a href="https://redmonk.com/team/james-governor/" rel="noopener noreferrer"&gt;James Governor at RedMonk&lt;/a&gt;, who talked about new software development practices beyond continuous delivery. Based on James Governor's transcript on Progressive Delivery, we want to minimize the blast radius and control the delivery. &lt;br&gt;
This could be done by diverting some traffic to new deployment, measuring the success metrics, and then promoting the release to all users. Some of deployment strategies are &lt;a href="https://www.infracloud.io/blogs/progressive-delivery-argo-rollouts-canary-deployment/" rel="noopener noreferrer"&gt;Canary&lt;/a&gt;, &lt;a href="https://www.infracloud.io/blogs/progressive-delivery-argo-rollouts-blue-green-deployment/" rel="noopener noreferrer"&gt;Blue-Green&lt;/a&gt;, and &lt;a href="https://docs.flagger.app/tutorials/istio-ab-testing" rel="noopener noreferrer"&gt;A&amp;amp;B testing&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;There are a lot of tools that allow us to implement Progressive Delivery. Azure DevOps, AWS App Mesh are the widely used proprietary tools, while ArgoCD and Flux are widely used open source tools. In this blog post, we shall focus on Flux &amp;amp; Flagger, which is an open source tool that is quite popular.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is Flux?&lt;/strong&gt;&lt;br&gt;
Flux is a tool for keeping Kubernetes clusters in sync with sources of configuration (like Git repositories) and automating updates to the configuration when there is new code to deploy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is Flagger?&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://github.com/fluxcd/flagger" rel="noopener noreferrer"&gt;Flagger&lt;/a&gt; is a Progressive Delivery tool that automates the release process for applications running on Kubernetes. Under the hood, both tools are built on top of a modular GitOps toolkit. It is the main reason why Flagger compliments Flux.&lt;/p&gt;

&lt;h2&gt;
  
  
  Typical Pipeline
&lt;/h2&gt;

&lt;p&gt;Let's level-set how a CI/CD pipeline works, then we can talk about how Flux and Flagger fit in the picture. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faejjr6lbufjkk98gvdqo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faejjr6lbufjkk98gvdqo.png" alt="CI/CD Pipeline"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In a typical CI/CD pipeline, we push the latest images to the registry and config changes to a repository. Hereon the Ops person will correct the cluster state with the new config changes by applying a new config or upgrading the existing resources in the Kubernetes cluster. This also typically means that the ops person should know the changes that need to be made along with the context of those changes, and hence this manual process quickly becomes error-prone.&lt;/p&gt;

&lt;p&gt;This whole process also becomes time-consuming and hard to manage. There can be issues that might occur after applying the latest changes. We need to have a solid &amp;amp; spontaneous feedback loop on new releases.&lt;/p&gt;

&lt;p&gt;What if we could automate the whole process from deployment to production and have proper change management in place for application &amp;amp; infra configuration? Here comes Flux, which helps us automate image tag updates to git and reconciliation of clusters to the desired state as soon as new changes are pushed to the git repository. &lt;/p&gt;

&lt;h2&gt;
  
  
  Flux
&lt;/h2&gt;

&lt;p&gt;Let’s put Flux in place to see how we resolve all those issues from a typical pipeline.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffiarfxp7gy12kvfniw0n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffiarfxp7gy12kvfniw0n.png" alt="GitOps"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;
&lt;p&gt;Source: &lt;a href="https://www.gitops.tech/#what-is-gitops" rel="noopener noreferrer"&gt; GitOps &lt;/a&gt;&lt;/p&gt;
&lt;/center&gt;

&lt;p&gt;Flux is based on the Operator pattern. Operator pattern is software extension to Kubernetes that uses custom resources to manage application components is built on top of Kubernetes API.&lt;/p&gt;

&lt;h3&gt;
  
  
  Installation of Flux
&lt;/h3&gt;

&lt;p&gt;Installation of Flux is straightforward. You need to install Flux CLI to run the bootstrapping process. The bootstrapping process will create a repository on GitHub (or any other git hosting service) and all required manifests for installation and connection to the git repository. Follow this doc to &lt;a href="https://fluxcd.io/docs/get-started/" rel="noopener noreferrer"&gt;Get started with Flux&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reconciliation
&lt;/h3&gt;

&lt;p&gt;Flux keeps a constant watch on the changes in your repository. It doesn’t require any event to start the reconciliation loop. It allows you to configure the reconciliation loop at each component. You can have your git checked every 3 minutes and sync in 10 min, allowing you to stagger how reconciliation happens.&lt;/p&gt;

&lt;h3&gt;
  
  
  Automate image updates to Git
&lt;/h3&gt;

&lt;p&gt;When Flux comes into the picture, it will start watching your image registry for new updates and push back to git for you. We don’t have to take care of updating the image IDs this time. This feature is not enabled while setting up the Flux. You can follow this &lt;a href="https://fluxcd.io/docs/guides/image-update/" rel="noopener noreferrer"&gt;Automate image update to Git - Flux&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffhcy1ynp1y9jcqmuohn7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffhcy1ynp1y9jcqmuohn7.png" alt="Image reflector and automation controllers - Flux"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;
&lt;p&gt;Source: &lt;a href="https://fluxcd.io/docs/components/image/" rel="noopener noreferrer"&gt; Image reflector and automation controllers | Flux &lt;/a&gt;&lt;/p&gt;
&lt;/center&gt;

&lt;h3&gt;
  
  
  Secret Management
&lt;/h3&gt;

&lt;p&gt;Once you adopt GitOps, you need to find a way to manage the secrets that your application might require to communicate to other services within the Kubernetes cluster.&lt;br&gt;
You can’t simply store your application secrets inside the git repository, right? You might be thinking of different ways of encryption. For example, you can commit secrets to version control and enable Flux to decrypt them.&lt;/p&gt;

&lt;p&gt;Flux provides two guides to store secrets through &lt;a href="https://fluxcd.io/docs/guides/sealed-secrets/" rel="noopener noreferrer"&gt;Sealed Secrets&lt;/a&gt; and &lt;a href="https://fluxcd.io/docs/guides/mozilla-sops/" rel="noopener noreferrer"&gt;Mozilla SOPS&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Application Delivery
&lt;/h3&gt;

&lt;p&gt;Unlike other options Flux natively supports Helm and uses native Helm library to deploy helm release onto the cluster. This means you can run &lt;code&gt;helm ls&lt;/code&gt; on a cluster and it will show exactly how &lt;code&gt;helm install&lt;/code&gt; works. &lt;/p&gt;

&lt;p&gt;Another important thing is Flux allows you to manage dependencies between HelmRelease CRDs or Kustomization CRDs. It enables you to control the load order of collections/groups of YAML files. It does not maintain the order in which individual YAML files are applied.&lt;/p&gt;

&lt;h3&gt;
  
  
  Promote Release
&lt;/h3&gt;

&lt;p&gt;Flux can help you automate the process of promoting the release with GitHub Actions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5jdpq52rjqum7csg974g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5jdpq52rjqum7csg974g.png" alt="Promote Flux Helm Releases with GitHub Actions "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;
&lt;p&gt;Source: &lt;a href="https://fluxcd.io/docs/use-cases/gh-actions-helm-promotion/" rel="noopener noreferrer"&gt; Promote Flux Helm Releases with GitHub Actions &lt;/a&gt;&lt;/p&gt;
&lt;/center&gt;

&lt;h3&gt;
  
  
  Webhooks
&lt;/h3&gt;

&lt;p&gt;Flux is, by design pull-based (i.e, identifies the changes directly from the source ) and good at managing drift in clusters because it is easier to correct the state of a cluster from inside rather than from outside where your tool doesn’t have a correct understanding of the current state of clusters.  &lt;/p&gt;

&lt;p&gt;Suppose you want to make your pipeline responsive as a push-based or faster process. In that case, you can set up webhook receivers on git push that will trigger reconciliation. &lt;/p&gt;

&lt;h3&gt;
  
  
  Alerting and notifications
&lt;/h3&gt;

&lt;p&gt;Flux can notify you about the resource statuses change as the health of the new app's version. You can receive alerts about reconciliation failure in clusters and configure different mediums for reporting - channels such as Slack or embedded in git commit status. This helps inform the developer team whether the new version of the app was deployed and whether it is healthy or not.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6db3n3xv33ieip8b505f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6db3n3xv33ieip8b505f.png" alt="Flux Slack Error Alerts"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;
&lt;p&gt;Source: &lt;a href="https://fluxcd.io/img/slack-error-alert.png" rel="noopener noreferrer"&gt; Flux Slack Error Alerts &lt;/a&gt;&lt;/p&gt;
&lt;/center&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs4bebu8xybtvt0aaphj8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs4bebu8xybtvt0aaphj8.png" alt="Setup Notifications | Flux"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;
&lt;p&gt;Source: &lt;a href="https://fluxcd.io/docs/guides/notifications/" rel="noopener noreferrer"&gt; Setup Notifications | Flux &lt;/a&gt;&lt;/p&gt;
&lt;/center&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F93e0wc2vri42q4ujeusz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F93e0wc2vri42q4ujeusz.png" alt="Notification Controller | Flux"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;
&lt;p&gt;Source: &lt;a href="https://fluxcd.io/docs/components/notification/" rel="noopener noreferrer"&gt; Notification Controller | Flux &lt;/a&gt;&lt;/p&gt;
&lt;/center&gt;

&lt;h3&gt;
  
  
  Authorization Methods
&lt;/h3&gt;

&lt;p&gt;Flux relies on the RBAC capabilities of Kubernetes and does not have its own authorization management. It uses Kubernetes RBAC for authentication or authorization. It can be a downside if you want to provide authorization using SSO.&lt;/p&gt;

&lt;h3&gt;
  
  
  User Interface
&lt;/h3&gt;

&lt;p&gt;There is no UI for Flux. It does have an &lt;a href="https://github.com/fluxcd/webui" rel="noopener noreferrer"&gt;experimental UI&lt;/a&gt; that is not in an active development state at the time of writing this.&lt;/p&gt;

&lt;h2&gt;
  
  
  Flagger
&lt;/h2&gt;

&lt;p&gt;So far, we have automated our delivery process to the cluster with alerts and notifications in case of any failure and unhealthy state of the cluster. Now, we will look at how Flagger integrates into this process and allows different deployment strategies. How Flagger helps in Progressive Delivery.&lt;/p&gt;

&lt;p&gt;With the best alerting and notification in place, we are not resilient to downtime due to new releases. How can we be sure of our mission-critical services work as expected? A bad release can cause colossal business value loss. For example, your team might want to test a new feature on a small sample of users, and if that feature performs well, it will be rolled out to all users.&lt;/p&gt;

&lt;p&gt;In order to do that without causing any hindrance to day-to-day activities. Flagger lets us automate the release process and reduces the risk of introducing a new release in production by gradually shifting traffic to the new release while measuring metrics and running conformance tests. &lt;/p&gt;

&lt;p&gt;Example of Progressive Delivery with Flagger.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk6qvi0c5ldm59pqsv6v6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk6qvi0c5ldm59pqsv6v6.png" alt="stefanprodan/gitops-progressive-delivery"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;
&lt;p&gt;Source: &lt;a href="https://github.com/stefanprodan/gitops-progressive-delivery" rel="noopener noreferrer"&gt; stefanprodan/gitops-progressive-delivery &lt;/a&gt;&lt;/p&gt;
&lt;/center&gt;

&lt;h3&gt;
  
  
  Configuration
&lt;/h3&gt;

&lt;p&gt;Flagger is compatible with any CI/CD solutions, so it can be used with Flux, Jenkins, Carvel, Argo, etc.  It supports various service mesh like App Mesh, Istio, Linkerd, Kuma, Open Service Mesh, or an ingress controller like Contour, Gloo, NGINX, Skipper, and Traefik. It has excellent compatibility with Linkerd and it's reasonably easy to get started with canary release and metrics analysis.&lt;/p&gt;

&lt;p&gt;One of the important factors that come into the picture sometime i.e., Flagger doesn’t require replacing Deployment objects with any custom type.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deployment Strategies
&lt;/h3&gt;

&lt;p&gt;Flagger implements several deployment strategies which help you achieve the same objective which is shifting traffic gradually to a new version of the release. Some of the strategies are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Canary Releases&lt;/li&gt;
&lt;li&gt;A/B Testing&lt;/li&gt;
&lt;li&gt;Blue/Green Mirroring&lt;/li&gt;
&lt;li&gt;Blue/Green&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Check out &lt;a href="https://fluxcd.io/flagger/usage/deployment-strategies/" rel="noopener noreferrer"&gt;Flux CD official docs to know more about deployment strategies&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Metrics
&lt;/h3&gt;

&lt;p&gt;Flagger comes with built-in metrics and a Grafana dashboard for canary analysis. It exposes Prometheus metrics to dig more into the canary analysis. You can create custom metrics that can be used to do metrics analysis for release.&lt;/p&gt;

&lt;p&gt;That’s the beauty of it once Flagger validates service level objects like response time and any other metrics specific to the app, it promotes the release otherwise, it will be automatically rolled back with minimum impact to end-users. We’re not diving into the details of how the metrics template can be used in the analysis step.&lt;/p&gt;

&lt;h3&gt;
  
  
  Manual Gating
&lt;/h3&gt;

&lt;p&gt;Not just metrics-based approval, you can perform manual gating to have more control over your canary analysis. There are different kinds of webhooks that you can leverage at each step of canary analysis, for example, confirm-rollout and conform-promotion. A flagger will halt the canary traffic shifting and analysis until the confirm webhook returns HTTP status 200.&lt;/p&gt;

&lt;p&gt;Flagger also comes with load testing that can generate traffic during analysis.&lt;br&gt;
You can read more about &lt;a href="https://docs.flagger.app/usage/webhooks" rel="noopener noreferrer"&gt;Webhooks - Flagger&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;Let’s look at developer experience for both tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  Developer Experience
&lt;/h2&gt;

&lt;p&gt;Flux and Flagger both have a high learning curve and a lot of functionality, which means more power and can sometimes overwhelm the developer. Both don’t have any UI.&lt;br&gt;
The setup experience is pretty straightforward. If you talk about logging experience, you might need to get your hands dirty in CLI; otherwise, in other tools, you might have a UI that will show you the current progress of deployment. This makes life easier for a lot of developers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;We look at both tools and how they fit in our CI/CD pipeline and help us deliver progressively. With the use of Flagger, we can split traffic into proportions, which helps in testing out new releases to a subset of users or even getting feedback. Whether a new release should be released or not to all users. &lt;/p&gt;

&lt;p&gt;I hope you learned how these tools fit into GitOps with Progressive Delivery practice.&lt;/p&gt;

&lt;p&gt;If you are looking to switch to &lt;a href="https://www.infracloud.io/ci-cd-consulting/" rel="noopener noreferrer"&gt;Progressive Delivery with GitOps, talk to our CI/CD experts&lt;/a&gt;, who can help you not only suggest but also implement such a solution end to end.&lt;/p&gt;

&lt;h2&gt;
  
  
  References and further reading:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.infoq.com/podcasts/flux-flagger-operator-pattern/" rel="noopener noreferrer"&gt;Stefan Prodan on Flux, Flagger and the Operator pattern&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.solo.io/blog/glooops-progressive-delivery-the-gitops-way/" rel="noopener noreferrer"&gt;GlooOps: Progressive delivery, the GitOps way&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.gitops.tech/" rel="noopener noreferrer"&gt;GitOps&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.weave.works/technologies/gitops/" rel="noopener noreferrer"&gt;Guide to GitOps&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>devops</category>
      <category>gitops</category>
      <category>kubernetes</category>
      <category>git</category>
    </item>
  </channel>
</rss>
