<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Stephen Barlow</title>
    <description>The latest articles on DEV Community by Stephen Barlow (@stephenbarlow).</description>
    <link>https://dev.to/stephenbarlow</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1175447%2Fd10ff11d-0bd8-48d7-bb51-4d47883e969f.jpeg</url>
      <title>DEV Community: Stephen Barlow</title>
      <link>https://dev.to/stephenbarlow</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/stephenbarlow"/>
    <language>en</language>
    <item>
      <title>How Render Scaled Knative to Support 100k+ Free-Tier Apps</title>
      <dc:creator>Stephen Barlow</dc:creator>
      <pubDate>Tue, 03 Oct 2023 18:37:29 +0000</pubDate>
      <link>https://dev.to/render/how-render-scaled-knative-to-support-100k-free-tier-apps-5g3g</link>
      <guid>https://dev.to/render/how-render-scaled-knative-to-support-100k-free-tier-apps-5g3g</guid>
      <description>&lt;p&gt;&lt;strong&gt;By Hieu Nguyen - October 3, 2023&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In November 2021, Render introduced a &lt;a href="https://render.com/docs/free" rel="noopener noreferrer"&gt;free tier&lt;/a&gt; for hobbyist developers and teams who want to kick the tires. Adoption grew at a steady, predictable rate—until Heroku announced the end of &lt;em&gt;their&lt;/em&gt; free offering ten months later:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2bal39z76f3nvuhtln53.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2bal39z76f3nvuhtln53.png" alt="Graph of free-tier app creation" width="800" height="563"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Render's free-tier adoption rate doubled immediately and grew from there&lt;/strong&gt; (awesome), causing our infrastructure to creak under the load (less awesome). In the span of a month, we experienced four incidents related to this surge. We knew that if Free usage continued to grow (and it very much has—as of this writing, &lt;strong&gt;tens of thousands&lt;/strong&gt; of free-tier apps are created each week), we needed to make it much more scalable. This post describes the first step we took along that path.&lt;/p&gt;

&lt;h2&gt;
  
  
  How we initially built Free
&lt;/h2&gt;

&lt;p&gt;Some background: unlike other services on Render, free-tier web services "scale to zero" (as in, they stop running) if they go 15 minutes without receiving traffic. They start up again whenever they next receive an incoming request. This hibernation behavior helps us provide a no-cost offering without breaking the bank.&lt;/p&gt;

&lt;p&gt;However, this desired behavior presented an immediate development challenge. Render uses Kubernetes (K8s) behind the scenes, and K8s didn't natively support scale-to-zero (&lt;a href="https://github.com/kubernetes/enhancements/pull/2022" rel="noopener noreferrer"&gt;it still doesn't, as of September 2023&lt;/a&gt;). In looking for a solution that did, we found and settled on &lt;a href="https://knative.dev/docs/" rel="noopener noreferrer"&gt;Knative&lt;/a&gt; (kay-NAY-tiv). Knative extended Kubernetes with serverless support—a natural fit for services that would regularly spin up and down.&lt;/p&gt;

&lt;p&gt;In the interest of shipping quickly, we deployed Knative with its default configuration. And, until our growth spurt nearly a year later, those defaults worked without issue.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where we hit a wall
&lt;/h2&gt;

&lt;p&gt;With the free-tier surge, the total number of apps on Render effectively quadrupled. This put significant strain on the networking layer of each of our Kubernetes clusters. To understand the nature of that strain, let's look at how this layer operates.&lt;/p&gt;

&lt;p&gt;Two networking components run on every node in every cluster: &lt;a href="https://github.com/projectcalico/calico" rel="noopener noreferrer"&gt;Calico&lt;/a&gt; and &lt;a href="https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/" rel="noopener noreferrer"&gt;kube-proxy&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Calico&lt;/strong&gt; mainly takes care of IP address management, or IPAM: assigning IP addresses to Pods and Services (we're using capital-S &lt;strong&gt;Service&lt;/strong&gt; to refer to a &lt;a href="https://kubernetes.io/docs/concepts/services-networking/service/" rel="noopener noreferrer"&gt;Kubernetes Service&lt;/a&gt;, to distinguish from the services that customers create on Render.). It also enforces &lt;a href="https://kubernetes.io/docs/concepts/services-networking/network-policies/" rel="noopener noreferrer"&gt;Network Policies&lt;/a&gt; by managing iptables rules on the node.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;kube-proxy&lt;/strong&gt; configures a different set of routing rules on the node to ensure traffic destined for a Service is load-balanced across all backing Pods.&lt;/p&gt;

&lt;p&gt;Both of these components do their jobs by listening for creates, updates, and deletes to all Pods and Services in the cluster. As you can imagine, having &lt;em&gt;more&lt;/em&gt; Pods and Services that changed &lt;em&gt;more&lt;/em&gt; frequently resulted in &lt;em&gt;more&lt;/em&gt; work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;More work meant more CPU consumption.&lt;/strong&gt; Remember, both Calico and kube-proxy run on &lt;em&gt;every&lt;/em&gt; node. The more CPU these components used, the less we had left to run our customers' apps.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;More work meant higher update latency.&lt;/strong&gt; As the work queue grew, each networking change took longer to propagate due to increased time spent waiting in the queue. This delay is defined as the &lt;strong&gt;network programming latency&lt;/strong&gt;, or NPL (read more about NPL &lt;a href="https://github.com/kubernetes/community/blob/master/sig-scalability/slos/network_programming_latency.md" rel="noopener noreferrer"&gt;here&lt;/a&gt;). When there was high NPL, traffic could be routed using stale rules that led nowhere (the Pod had already been destroyed), causing connections to fail intermittently.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To mitigate these issues, we needed to reduce the overhead each free-tier app added to our networking machinery.&lt;/p&gt;

&lt;h2&gt;
  
  
  "Serviceless" Knative
&lt;/h2&gt;

&lt;p&gt;As mentioned, we'd deployed out-of-the-box Knative to handle free-tier resource provisioning. We took a closer look at exactly &lt;em&gt;what&lt;/em&gt; K8s primitives were being provisioned for each free-tier app:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One Pod (for running the application code). Expected.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;2N + 1&lt;/code&gt; Services, where &lt;code&gt;N&lt;/code&gt; is the number of times the app was deployed. This is because Knative manages changes with &lt;a href="https://knative.dev/docs/concepts/serving-resources/revisions/" rel="noopener noreferrer"&gt;Revisions&lt;/a&gt;, and retained resources belonging to historical Revisions. _Un_expected.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We figured the Pod needed to stay, but did we really need all those Kubernetes Services? What if we could get away with fewer—or even zero?&lt;/p&gt;

&lt;p&gt;We dove deeper into how those resources interacted in a cluster:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjwv6sdfqf5xdvzrlugo4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjwv6sdfqf5xdvzrlugo4.png" alt="Knative defaults in Render K8s clusters" width="800" height="439"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And learned what each of the Knative-provisioned Services (in purple above) was for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;Placeholder Service&lt;/strong&gt; was a dummy service that existed to prevent naming collisions among resources for Knative-managed apps. There was one for every free-tier app.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;Public Service&lt;/strong&gt; routed incoming traffic to the app from the &lt;em&gt;public internet&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;Private Service&lt;/strong&gt; routed incoming &lt;em&gt;cluster-local&lt;/em&gt; traffic based on whether the app was scaled up.

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;If scaled up&lt;/strong&gt;, traffic was routed to the Pod.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;If scaled down&lt;/strong&gt;, traffic was routed to the cluster's Knative proxy (called the &lt;a href="https://knative.dev/docs/serving/knative-kubernetes-services/#service-activator" rel="noopener noreferrer"&gt;activator&lt;/a&gt;), which handled scaling up the app by creating a Pod.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Armed with this newfound knowledge, we devised a path to remove all of these Services.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step by step
&lt;/h3&gt;

&lt;p&gt;We started simple with the dummy &lt;strong&gt;Placeholder Service&lt;/strong&gt;, which did &lt;em&gt;literally nothing&lt;/em&gt;. There was no risk of naming collisions among our Knative-managed resources, so we updated the Knative Route controller to stop creating the Placeholder Service. ❌&lt;/p&gt;

&lt;p&gt;Next! While the &lt;strong&gt;Public Service&lt;/strong&gt; (for public internet routing) is needed for plenty of Knative use cases out there, in Render-land, all requests from the public Internet must pass through our load-balancing layer. This means requests are guaranteed to be &lt;em&gt;cluster-local&lt;/em&gt; by the time they reach Pods, so the &lt;strong&gt;Public Service&lt;/strong&gt; &lt;em&gt;also&lt;/em&gt; had nothing to do! We patched Knative to stop reconciling it and its related Endpoint resources. ❌&lt;/p&gt;

&lt;p&gt;Finally, the &lt;strong&gt;Private Service&lt;/strong&gt; (for cluster-local routing). We put together the concepts that Services are used to balance load across backing Pods, and that a free-tier app can have at most only one Pod receiving traffic at a time, making load balancing &lt;em&gt;slightly&lt;/em&gt; unnecessary. There were two changes we needed to make:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Streamline traffic to flow exclusively through the activator, as we no longer had a Service to split traffic to when the app is scaled up. With a little experimentation, we discovered that the activator could both wake Pods &lt;em&gt;and&lt;/em&gt; reverse-proxy to a woke Pod, even though that behavior wasn't documented! We just needed to set the right headers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Patch the activator to listen for changes to Pod readiness states, and route directly to Pod IP addresses (thanks, Calico!). By default, the activator listens for changes to EndpointSlices, but those are tied to the Services we were hoping to delete.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And just like that, the &lt;strong&gt;Private Service&lt;/strong&gt; was no more. ❌&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Want to go deeper under the hood? Check out an abridged version of &lt;a href="https://render.com/blog/knative-design-doc" rel="noopener noreferrer"&gt;the design doc&lt;/a&gt; for removing the &lt;strong&gt;Private Service&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;At the end of this entire optimization pass, the networking architecture for a free-tier app had been simplified to the following:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frnjezl9jiucxttzjkuu9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frnjezl9jiucxttzjkuu9.png" alt="Free-tier architecture after Knative Service removal" width="656" height="306"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Zero&lt;/em&gt; Kubernetes Services per free-tier app! Predictably, K8s Service counts plummeted across our clusters:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj014v8xpr7iurd5vsrij.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj014v8xpr7iurd5vsrij.png" alt="Chart of Service count by cluster over time" width="800" height="337"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With these improvements, Calico and kube-proxy's combined usage fell by &lt;strong&gt;hundreds of CPU seconds&lt;/strong&gt; in our largest cluster.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feig0jos5mwknqm1h42kt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feig0jos5mwknqm1h42kt.png" alt="Chart of CPU usage over time" width="570" height="834"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With compute resources freed up, free-tier network latency and stability improved dramatically. But even so, we knew we had more work to do.&lt;/p&gt;

&lt;h2&gt;
  
  
  A moving target
&lt;/h2&gt;

&lt;p&gt;Our Knative tweaks bought us some much-needed breathing room, but ultimately, free-tier usage began to put a strain even on this optimized architecture. The time was quickly approaching for us to rip out Knative entirely, in favor of a home-grown solution that was tailor-made for Render's needs.&lt;/p&gt;

&lt;p&gt;But that's a story for another post!&lt;/p&gt;

&lt;p&gt;--&lt;/p&gt;

&lt;h3&gt;
  
  
  Related reading
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://render.com/blog/knative-design-doc" rel="noopener noreferrer"&gt;Design doc for removing the &lt;strong&gt;Private Service&lt;/strong&gt;&lt;/a&gt; (abridged)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://knative.dev/docs/concepts/" rel="noopener noreferrer"&gt;Knative documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://render.com/careers" rel="noopener noreferrer"&gt;Careers at Render&lt;/a&gt; 😉&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>kubernetes</category>
      <category>knative</category>
    </item>
  </channel>
</rss>
