<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Roman Belshevitz</title>
    <description>The latest articles on DEV Community by Roman Belshevitz (@rbelshevitz).</description>
    <link>https://dev.to/rbelshevitz</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F740774%2F6bfe5f6f-8b94-487a-b524-b407e715aefc.png</url>
      <title>DEV Community: Roman Belshevitz</title>
      <link>https://dev.to/rbelshevitz</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rbelshevitz"/>
    <language>en</language>
    <item>
      <title>Das U-Boot: from Power-On to initrd</title>
      <dc:creator>Roman Belshevitz</dc:creator>
      <pubDate>Wed, 07 May 2025 15:01:05 +0000</pubDate>
      <link>https://dev.to/rbelshevitz/from-power-on-to-initrd-with-u-boot-35da</link>
      <guid>https://dev.to/rbelshevitz/from-power-on-to-initrd-with-u-boot-35da</guid>
      <description>&lt;p&gt;&lt;a href="https://github.com/u-boot/u-boot" rel="noopener noreferrer"&gt;Das U-Boot&lt;/a&gt; is a well-known bootloader which brings embedded Linux devices to life since 1999, it turns 25 this year. &lt;/p&gt;

&lt;p&gt;My today's post walks through the complete U-Boot boot process, covering everything from SoC power-on to launching the Linux &lt;code&gt;initrd&lt;/code&gt;, along with hardware-specific gotchas.&lt;/p&gt;




&lt;h2&gt;
  
  
  The boot flow
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;[&lt;/span&gt;Power On]
   ↓
&lt;span class="o"&gt;[&lt;/span&gt;SoC BootROM] → &lt;span class="o"&gt;[&lt;/span&gt;SPL]
   ↓
&lt;span class="o"&gt;[&lt;/span&gt;U-Boot Proper]
   ↓
&lt;span class="o"&gt;[&lt;/span&gt;Loads kernel + dtb + initrd]
   ↓
&lt;span class="o"&gt;[&lt;/span&gt;bootz / booti]
   ↓
&lt;span class="o"&gt;[&lt;/span&gt;Linux Kernel starts]
   ↓
&lt;span class="o"&gt;[&lt;/span&gt;initrd: /init runs]
   ↓
&lt;span class="o"&gt;[&lt;/span&gt;Switch to rootfs]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  🔌 1. Power-On: SoC BootROM
&lt;/h3&gt;

&lt;p&gt;Every SoC contains a hardcoded BootROM:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Executes right after power-on or reset&lt;/li&gt;
&lt;li&gt;Detects the boot source via boot pins&lt;/li&gt;
&lt;li&gt;Loads the Secondary Program Loader (SPL) from flash, SD, UART, or USB&lt;/li&gt;
&lt;li&gt;Minimal hardware is initialized here&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  📦 2. SPL (Secondary Program Loader)
&lt;/h3&gt;

&lt;p&gt;SPL is a tiny version of U-Boot:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Brings up DRAM and essential regulators&lt;/li&gt;
&lt;li&gt;Loads full U-Boot into RAM&lt;/li&gt;
&lt;li&gt;Operates within very constrained SRAM limits&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;em&gt;Hardware nuance:&lt;/em&gt; If DRAM init fails here, the system will hang silently. Always verify timing parameters for your DDR/LPDDR.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  🚀 3. U-Boot Proper
&lt;/h3&gt;

&lt;p&gt;Now the full U-Boot runs from RAM:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Initializes serial console, eMMC/SD, network, USB, etc.&lt;/li&gt;
&lt;li&gt;Parses environment variables (&lt;code&gt;bootargs&lt;/code&gt;, &lt;code&gt;bootcmd&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Loads:

&lt;ul&gt;
&lt;li&gt;Linux kernel (&lt;code&gt;zImage&lt;/code&gt; or &lt;code&gt;Image&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Device Tree Blob (&lt;code&gt;.dtb&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;initrd (&lt;code&gt;initramfs&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Starts the boot using &lt;code&gt;bootz&lt;/code&gt; or &lt;code&gt;booti&lt;/code&gt;
&lt;/li&gt;

&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bootz &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;kernel_addr_r&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ramdisk_addr_r&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;fdt_addr_r&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Regarding boot device order, modern U-Boot uses so-called "Distro Boot" framework:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;bootcmd&lt;/code&gt; runs &lt;code&gt;distro_bootcmd&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;distro_bootcmd&lt;/code&gt; iterates over &lt;code&gt;boot_targets&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;distro_bootcmd&lt;/code&gt; mechanism in U-Boot supports booting using configuration files like &lt;code&gt;extlinux.conf&lt;/code&gt; and scripts like &lt;code&gt;boot.scr&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;extlinux.conf&lt;/code&gt;: A configuration file that specifies kernel, initrd, and device tree paths, allowing for multiple boot entries and parameters.&lt;/li&gt;
&lt;li&gt;boot.scr: a compiled script containing U-Boot commands, offering a way to automate complex boot sequences.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Looks for known boot files: &lt;code&gt;extlinux.conf&lt;/code&gt;, &lt;code&gt;boot.scr&lt;/code&gt;, &lt;code&gt;boot.ini&lt;/code&gt;, or standard kernel/initrd/fdt triplets. Usually there is '1st eMMC/SD - 1st USB - 1st NVMe' sequence of boot devices.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;setenv&lt;/code&gt; command in U-Boot is used to define or override environment variables, which control nearly every aspect of how U-Boot boots your system.&lt;/p&gt;

&lt;p&gt;🔧 Basic Syntax&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;setenv variable_name value
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;setenv bootargs &lt;span class="s2"&gt;"console=ttyS0,115200 root=/dev/mmcblk0p2 rw"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To persist changes do &lt;code&gt;saveenv&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This writes changes to persistent storage (e.g., SPI NOR, eMMC, NAND, or a UBI volume). If you don't run &lt;code&gt;saveenv&lt;/code&gt;, any updates you may have made to the U-Boot environment will be lost on next system reboot.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 Use &lt;code&gt;printenv&lt;/code&gt; and &lt;code&gt;bdinfo&lt;/code&gt; to check variables and layout. Be careful: U-Boot does not validate variable syntax, so typos can silently break boot! To recover from bad bootcmd or bootargs, use serial console and interrupt boot early (usually by pressing a key).&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  🐧 4. Linux Kernel Boot
&lt;/h3&gt;

&lt;p&gt;The kernel is now in control:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Unpacks itself into RAM&lt;/li&gt;
&lt;li&gt;Parses command-line and DTB&lt;/li&gt;
&lt;li&gt;Mounts &lt;code&gt;initrd&lt;/code&gt; as the rootfs&lt;/li&gt;
&lt;li&gt;Runs &lt;code&gt;/init&lt;/code&gt; script from initrd&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 If any addresses (kernel/initrd/fdt) overlap in RAM, the kernel &lt;strong&gt;may crash&lt;/strong&gt;. Always double-check memory layout!&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  🐧 5. What Happens in initrd?
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;initrd&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Loads drivers/modules&lt;/li&gt;
&lt;li&gt;Mounts the actual root filesystem (e.g., &lt;code&gt;/dev/mmcblk0p2&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;May perform decryption, overlay setup, or network discovery&lt;/li&gt;
&lt;li&gt;Executes &lt;code&gt;switch_root&lt;/code&gt; or &lt;code&gt;pivot_root&lt;/code&gt; to hand off to your real system (e.g., &lt;code&gt;systemd&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ⚙️ Hardware-Related Pitfalls
&lt;/h2&gt;

&lt;p&gt;Some common embedded issues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DRAM doesn't initialize? &lt;a href="https://docs.u-boot.org/en/latest/develop/memory.html" rel="noopener noreferrer"&gt;Wrong timing in SPL&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Kernel hangs early? &lt;a href="https://lists.denx.de/pipermail/u-boot/2008-May/034686.html" rel="noopener noreferrer"&gt;Overlapping memory regions&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Ethernet/MAC not working? &lt;a href="https://samuel.dionne-riel.com/blog/2024/12/05/dtb-loading-is-harder-than-it-looks.html" rel="noopener noreferrer"&gt;Wrong or mismatched DTB&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;NAND boot fails? &lt;a href="https://adaptivesupport.amd.com/s/question/0D52E00006hpLPbSAM/enabling-ecc-results-in-kernel-panic-during-booting-with-initramfs?language=en_US" rel="noopener noreferrer"&gt;Missing ECC config in U-Boot&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Secure boot halts execution? &lt;a href="https://docs.u-boot.org/en/v2025.01/usage/fit/verified-boot.html" rel="noopener noreferrer"&gt;U-Boot binary not signed&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Sources: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.u-boot.org/en/latest/" rel="noopener noreferrer"&gt;https://docs.u-boot.org/en/latest/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://samuel.dionne-riel.com/blog/2024/12/05/dtb-loading-is-harder-than-it-looks.html" rel="noopener noreferrer"&gt;https://samuel.dionne-riel.com/blog/2024/12/05/dtb-loading-is-harder-than-it-looks.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842223/U-boot" rel="noopener noreferrer"&gt;Xilinx Wiki: U-boot&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://microchip.my.site.com/s/article/SAMA5D3-EDS---Modifying-Device-Tree-Overlays-in-U-Boot-Prompt-using--fdt--utility" rel="noopener noreferrer"&gt;Microchip: Modifying Device Tree Overlays in U-Boot Prompt using 'fdt' utility&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Todo: add more context; describe ARM related nuances...&lt;br&gt;
Cover pic: Pixabay / Pexels&lt;/p&gt;

</description>
      <category>linux</category>
      <category>embedded</category>
    </item>
    <item>
      <title>Landing From Clouds: Why On-Premise Will Eventually Win... In a Large Number of Cases</title>
      <dc:creator>Roman Belshevitz</dc:creator>
      <pubDate>Mon, 05 May 2025 21:36:26 +0000</pubDate>
      <link>https://dev.to/rbelshevitz/landing-from-clouds-why-on-premise-will-eventually-win-4j7o</link>
      <guid>https://dev.to/rbelshevitz/landing-from-clouds-why-on-premise-will-eventually-win-4j7o</guid>
      <description>&lt;p&gt;&lt;em&gt;Update: Well, folks, I edited the title. No one spoke so categorically. Clouds will remain in their niches for some time. But these are strictly niches. In the article I explain why the hype will subside.&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;“The computer industry is the only industry that is more fashion-driven than women’s fashion.”&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;Larry Ellison&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In our era, the cloud is the haute couture of infrastructure. Everyone wants it. Everyone says you &lt;em&gt;need&lt;/em&gt; it. And few ask whether it actually fits.&lt;/p&gt;

&lt;p&gt;This post is for those who remember that &lt;a href="https://news.ycombinator.com/item?id=37965142" rel="noopener noreferrer"&gt;not every problem needs 12 layers of abstraction&lt;/a&gt;. It's for careful builders who believe that &lt;strong&gt;owning your tools&lt;/strong&gt; is better than &lt;strong&gt;renting your soul&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;And here are &lt;strong&gt;12 reasons&lt;/strong&gt; why on-premise will outlast the hype.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Cloud Is Renting the Same Hardware for Triple the Price
&lt;/h2&gt;

&lt;p&gt;The cloud sells convenience, but charges premium rent, &lt;strong&gt;forever&lt;/strong&gt;. When you buy servers, you own an asset. When you rent a VM, you feed a meter. The cloud &lt;strong&gt;is like living in a hotel room&lt;/strong&gt; and bragging you don’t have to fix the faucet. &lt;/p&gt;

&lt;p&gt;Sure, but you’re paying 10× the mortgage and still can’t open a window.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. DevOps Became a Way to Work 24/7
&lt;/h2&gt;

&lt;p&gt;DevOps was meant to unify teams. Instead, it blurred boundaries: developer and admin, work and sleep, code and ops. Now every developer is also on call. This "you build it, you run it" culture leads to burnout, not harmony.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Complexity Is Not a Virtue
&lt;/h2&gt;

&lt;p&gt;In the old days, a deployment was &lt;code&gt;make install&lt;/code&gt;. Now it's a procession of YAML files, container registries, ephemeral environments, secrets managers, Terraform pipelines, and CI/CD rituals that often do less than a Bash script with &lt;code&gt;rsync&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The industry calls this &lt;em&gt;“modern infrastructure”&lt;/em&gt;. I call it &lt;em&gt;complexity theater&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Many engineers grew up on the UNIX philosophy — small, composable tools, each doing one job well. But this doesn’t mean they signed up for a balkanized mesh of microservices, &lt;strong&gt;each with its own language, API, database, and dev team&lt;/strong&gt;. Microservices often don’t simplify — they multiply: bugs, failure points, logging silos, and deployment dependencies.&lt;/p&gt;

&lt;p&gt;Abstraction without necessity is just friction in disguise. Ask yourself: Do you need ten services—or just ten functions?&lt;/p&gt;

&lt;h2&gt;
  
  
  4. You Will Regret Vendor Lock-In
&lt;/h2&gt;

&lt;p&gt;When you rent your stack, the landlord can raise the rent—or remove the plumbing. Cloud APIs change, regions go down, and cost models shift. &lt;/p&gt;

&lt;p&gt;Take &lt;strong&gt;CoreWeave&lt;/strong&gt;, a cloud provider for AI workloads: they’ve accumulated &lt;strong&gt;$7.5 billion in debt&lt;/strong&gt;, &lt;a href="https://www.ft.com/content/163c6927-2032-4346-857e-8e3787e4babc" rel="noopener noreferrer"&gt;much of it due to aggressive infrastructure expansion&lt;/a&gt; without sustainable financial models. They now face &lt;strong&gt;$1 billion/year in interest&lt;/strong&gt;. Growth without sovereignty is a debt spiral.  &lt;/p&gt;

&lt;h2&gt;
  
  
  5. Not Every Company Needs Speed. Some Need Sanity
&lt;/h2&gt;

&lt;p&gt;There’s a cult of speed in DevOps: deploy 50 times a day! But most companies — municipalities, utilities, banks — don’t need to push &lt;strong&gt;hourly patches&lt;/strong&gt;. They need reliability.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;2025 Vertice survey&lt;/strong&gt; showed that &lt;strong&gt;55% of CFOs&lt;/strong&gt; &lt;a href="https://www.cfodive.com/news/runaway-cloud-spending-frustrates-finance-execs-vertice/694706" rel="noopener noreferrer"&gt;saw cloud spending increase year-over-year&lt;/a&gt;, and &lt;strong&gt;24% called it “significant.”&lt;/strong&gt; These weren’t startup execs—they were finance leaders. The "move fast" mantra, unchecked, has real financial consequences.  &lt;/p&gt;

&lt;h2&gt;
  
  
  6. Data Wants to Stay Home
&lt;/h2&gt;

&lt;p&gt;For many industries, data is too large or too sensitive to outsource. Petabytes of sensor data, medical records, or real-time video don’t want to live in some abstract region. They want proximity. They want privacy. And often, regulators &lt;strong&gt;demand it&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When regulators ask, “Where is the data?” — you can’t answer “uh... &lt;code&gt;us-east-1&lt;/code&gt;”. You need direct control. On-premise allows you to define, audit, and defend your policies with confidence. &lt;/p&gt;

&lt;p&gt;Compliance is not a checkbox, not something to outsource! Regulatory compliance isn’t just a checklist — it’s a legal and financial time bomb, especially when you're storing personal data somewhere in the cloud. Frameworks like GDPR in Europe and HIPAA in the U.S. demand strict control over where data resides, who can access it, and how it’s processed. Cloud vendors offer compliance-ready services, but when breaches or misconfigurations occur, it's your organization — not AWS, GCP or Azure —writing the check.&lt;/p&gt;

&lt;p&gt;Consider these real-world reminders:&lt;/p&gt;

&lt;p&gt;💰 &lt;em&gt;TikTok's €530 Million Fine:&lt;/em&gt; In May 2025, Ireland's Data Protection Commission &lt;a href="https://www.theguardian.com/technology/2025/may/02/tiktok-fined-530m-for-failing-to-protect-user-data-from-chinese-state" rel="noopener noreferrer"&gt;fined TikTok €530 million&lt;/a&gt; for unlawfully transferring European user data to China without adequate safeguards, violating the General Data Protection Regulation (GDPR). The investigation revealed that TikTok failed to ensure sufficient data protection measures once the data was transferred, posing significant risks under Chinese surveillance laws. &lt;/p&gt;

&lt;p&gt;💰 &lt;em&gt;European Commission's Breach with Microsoft 365:&lt;/em&gt; In March 2024, the European Data Protection Supervisor found that the European Commission's use of Microsoft 365 &lt;a href="https://www.edps.europa.eu/press-publications/press-news/press-releases/2024/european-commissions-use-microsoft-365-infringes-data-protection-law-eu-institutions-and-bodies_en" rel="noopener noreferrer"&gt;infringed several key data protection rules&lt;/a&gt;. The Commission failed to provide appropriate safeguards for personal data transferred outside the EU/EEA, leading to a suspension order on data flows resulting from its use of Microsoft 365 to countries not covered by an adequacy decision. &lt;/p&gt;

&lt;p&gt;💰 &lt;em&gt;British Airways' £20 Million Fine:&lt;/em&gt; British Airways &lt;a href="https://www.theguardian.com/business/2020/oct/16/ba-fined-record-20m-for-customer-data-breach" rel="noopener noreferrer"&gt;faced a £20 million fine&lt;/a&gt; from the UK's Information Commissioner's Office after a 2018 data breach compromised the personal data of over 400,000 customers. The breach was attributed to poor security arrangements, including the storage of payment card details in plaintext and the use of outdated software, highlighting the airline's failure to protect customer data adequately. &lt;/p&gt;

&lt;p&gt;The moral: &lt;strong&gt;HIPAA and GDPR compliance&lt;/strong&gt; in the cloud isn’t automatic—and it certainly isn’t cheap. Keeping sensitive data on-premise, where access and residency are explicitly controlled, isn't just prudent—it might be the only defensible option when auditors come knocking.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. On-Prem Is Easier to Predict and Debug
&lt;/h2&gt;

&lt;p&gt;Multi-tenant clouds suffer from "noisy neighbors," surprise throttling, and vague incidents. With bare metal, what you provision is what you get.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;2025 report by Azul Systems&lt;/strong&gt; &lt;a href="https://www.cio.com/article/3957766/cios-are-overspending-on-the-cloud-but-still-think-its-worth-it.html" rel="noopener noreferrer"&gt;found that&lt;/a&gt; &lt;strong&gt;83% of CIOs overspent on the cloud&lt;/strong&gt;, and &lt;strong&gt;nearly half exceeded budgets by 25% or more&lt;/strong&gt;. You can’t fix what you can’t measure, and cloud abstraction blinds even the best teams.&lt;/p&gt;

&lt;h2&gt;
  
  
  9. The Cloud Is Not Environmentally Holy
&lt;/h2&gt;

&lt;p&gt;The cloud is sold as green, but hyperscale datacenters consume enormous power, often sourced from fossil fuels. Meanwhile, localized, efficient, low-power on-premise servers (especially ARM-based) can be far more sustainable in specific workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  10. Reclaiming Infrastructure Is Moral
&lt;/h2&gt;

&lt;p&gt;Cloud Capital, a FinOps startup, recently raised $7.7M to combat what they estimate is a &lt;strong&gt;$344 billion/year overspending crisis&lt;/strong&gt; in cloud infrastructure, &lt;a href="https://techstartups.com/2025/04/24/cloud-capital-emerges-from-stealth-with-7-7m-to-tackle-the-344b-cloud-cost-crisis-for-cfos" rel="noopener noreferrer"&gt;projected to hit&lt;/a&gt; &lt;strong&gt;$1 trillion by 2030&lt;/strong&gt;. Their business exists because the DevOps toolchain forgot to add a budget dashboard.  &lt;/p&gt;

&lt;h2&gt;
  
  
  11. Only Rushed Software Farms Need DevOps Teams — And They're Expensive
&lt;/h2&gt;

&lt;p&gt;Let’s be blunt: &lt;strong&gt;DevOps is not cheap&lt;/strong&gt;. You're not just hiring engineers—you’re hiring toolchain babysitters, pipeline janitors, Kubernetes whisperers, and Slack war-room veterans. If your software team ships once a month — or doesn’t need to scale horizontally by Thursday — &lt;strong&gt;why mimic the deployment posture of a fintech unicorn?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most businesses do not need that speed. They need uptime. They need audit trails. They need security. They do not need to pay $160K+ per year for someone to maintain Helm charts and debate YAML indentation style on GitHub.&lt;/p&gt;

&lt;p&gt;DevOps, in reality, serves urgency. But not every team should live in a permanent state of urgency. If you ship carefully and predictably, you don’t need a DevOps team — &lt;strong&gt;you need discipline and a good sysadmin&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  12. Everything Goes to Platforms
&lt;/h2&gt;

&lt;p&gt;What once ran on a server now lives in a platform. Version control? GitHub. Monitoring? Datadog. Deployments? Vercel. CI/CD? GHA &amp;amp; GitLab, until it breaks, then you’re Googling “runners stuck in pending”. The industry is becoming a patchwork of black boxes with dashboards.&lt;/p&gt;

&lt;p&gt;This isn't just infrastructure [as code] — &lt;strong&gt;it’s dependency&lt;/strong&gt;. You don’t maintain software anymore, you rent a slot in someone else’s stack. And every platform adds abstraction, latency, lock-in, and... another monthly invoice.&lt;/p&gt;

&lt;p&gt;Try doing it your way and you're suddenly unsupported, incompatible, or “non-compliant”. There's &lt;strong&gt;no oxygen left for homegrown DevOps&lt;/strong&gt; or pipelines-only “integrations”.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The industry has decided: DIY is dangerous&lt;/strong&gt;. Everything must be managed, billed, monitored — as a service.&lt;/p&gt;

&lt;p&gt;But this monoculture &lt;strong&gt;comes at a cost&lt;/strong&gt;. You lose understanding. You lose flexibility. And eventually, you lose the ability to build outside of someone else’s rules. What was once craft is now just configuration.&lt;/p&gt;

&lt;p&gt;On-prem systems, for all their grit, don’t hide from you. You know what it’s doing. You can tune it, fix it, or migrate it — without begging for API rate increases or watching a status page for two hours.&lt;/p&gt;

&lt;p&gt;Everything is becoming a platform. &lt;strong&gt;But platforms are not freedom&lt;/strong&gt;. They are permissioned productivity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Epilogue: The Clouds Will Part
&lt;/h2&gt;

&lt;p&gt;Cloud isn’t evil. But it isn’t holy either. It’s just another tool. The problem is cultural: we've confused speed with value, outsourcing with maturity, and complexity with progress.  &lt;/p&gt;

&lt;p&gt;It’s time to land. Touch the machines again. Grab data back.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Cover pic: "Piano" (Klavír), 'Pat and Mat' Czech animation series by 'Krátký film Praha' studio.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>sysadmin</category>
      <category>myth</category>
    </item>
    <item>
      <title>Accelerating OpenCV with CUDA on Jetson Orin NX: A Complete Build Guide</title>
      <dc:creator>Roman Belshevitz</dc:creator>
      <pubDate>Fri, 21 Feb 2025 14:30:00 +0000</pubDate>
      <link>https://dev.to/rbelshevitz/accelerating-opencv-with-cuda-on-jetson-orin-nx-a-complete-build-guide-525j</link>
      <guid>https://dev.to/rbelshevitz/accelerating-opencv-with-cuda-on-jetson-orin-nx-a-complete-build-guide-525j</guid>
      <description>&lt;h2&gt;
  
  
  What do we have right out of the box?
&lt;/h2&gt;

&lt;p&gt;The NVIDIA Jetson Orin NX is a powerful, community recognized edge AI platform, designed for real-time computer vision and deep learning applications. &lt;/p&gt;

&lt;p&gt;While &lt;a href="https://developer.nvidia.com/embedded/jetpack-sdk-511" rel="noopener noreferrer"&gt;JetPack 5.1.x&lt;/a&gt; provides an optimized environment with CUDA, cuDNN, and TensorRT, the default &lt;strong&gt;OpenCV&lt;/strong&gt; package in Ubuntu’s repositories does not take full advantage of the GPU. This means that task such as object detection, video processing, and feature extraction &lt;strong&gt;runs primarily on the CPU&lt;/strong&gt;, significantly limiting performance.&lt;/p&gt;

&lt;p&gt;To unlock the full potential of OpenCV on the Orin NX, we need to build it from source with CUDA and cuDNN enabled. This ensures that image processing and deep learning workloads benefit from GPU acceleration, leading to significant speed improvements. In this guide, we will walk through the entire build process, from installing dependencies to verifying a successful installation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Choosing the Right OpenCV Version
&lt;/h2&gt;

&lt;p&gt;For JetPack 5.1.x on Ubuntu 20.04, the most robust OpenCV versions are &lt;strong&gt;4.5.5&lt;/strong&gt; and &lt;strong&gt;4.6.0&lt;/strong&gt;. These versions have been tested extensively with CUDA 11 and cuDNN, ensuring compatibility and stability. While newer versions, such as 4.7.x and 4.8.x, are available, they may require additional patches and modifications to work seamlessly on Jetson hardware. I recommend sticking with 4.5.5 unless specific features from newer releases are needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building OpenCV with CUDA
&lt;/h2&gt;

&lt;p&gt;Before we start, it's important to remove any pre-installed OpenCV versions that might interfere with our custom build. &lt;/p&gt;

&lt;p&gt;The default &lt;code&gt;python3-opencv&lt;/code&gt; package from Ubuntu repositories is &lt;em&gt;CPU-only&lt;/em&gt; and does not support CUDA acceleration. To avoid conflicts, remove it along with other OpenCV-related packages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt remove --purge -y libopencv-dev libopencv-core-dev libopencv-imgproc-dev python3-opencv
sudo apt autoremove -y
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After building OpenCV, we must ensure Python correctly loads the CUDA-enabled version. We will set up the &lt;code&gt;PYTHONPATH&lt;/code&gt; accordingly.&lt;/p&gt;

&lt;p&gt;For the Jetson Orin NX, which uses the Ampere architecture, you should adjust the &lt;code&gt;CUDA_ARCH_BIN&lt;/code&gt; to &lt;code&gt;8.7&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;💥 Please keep in mind that you will need to download about six hundreds megabytes of packages from the Internet! One &lt;code&gt;libcudnn8-dev&lt;/code&gt; package alone weighs 397 MB!&lt;/p&gt;

&lt;p&gt;The following Bash script automates the entire process, ensuring a seamless installation of OpenCV with CUDA and Python bindings.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;

&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt;  &lt;span class="c"&gt;# Exit on error&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-x&lt;/span&gt;  &lt;span class="c"&gt;# Debug mode (prints each command)&lt;/span&gt;

&lt;span class="c"&gt;# Define OpenCV version&lt;/span&gt;
&lt;span class="nv"&gt;OPENCV_VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"4.5.5"&lt;/span&gt;

&lt;span class="c"&gt;# Update system packages&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;sudo &lt;/span&gt;apt upgrade &lt;span class="nt"&gt;-y&lt;/span&gt;

&lt;span class="c"&gt;# Install required dependencies&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; build-essential pv cmake ccache git unzip pkg-config &lt;span class="se"&gt;\&lt;/span&gt;
    libjpeg-dev libpng-dev libtiff-dev libavcodec-dev libavformat-dev &lt;span class="se"&gt;\&lt;/span&gt;
    libswscale-dev libv4l-dev v4l-utils libxvidcore-dev libx264-dev &lt;span class="se"&gt;\&lt;/span&gt;
    libgtk-3-dev libcanberra-gtk3-dev libtbb2 libtbb-dev libdc1394-22-dev &lt;span class="se"&gt;\&lt;/span&gt;
    python3-dev python3-numpy python3-pip libopenblas-dev libopenjp2-7-dev liblapack-dev gfortran &lt;span class="se"&gt;\&lt;/span&gt;
    libhdf5-dev libcudnn8-dev

&lt;span class="c"&gt;# Clone OpenCV and contrib modules&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; ~
git clone &lt;span class="nt"&gt;--branch&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OPENCV_VERSION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; https://github.com/opencv/opencv.git
git clone &lt;span class="nt"&gt;--branch&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OPENCV_VERSION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; https://github.com/opencv/opencv_contrib.git

&lt;span class="c"&gt;# Create build directory&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; ~/opencv
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; build &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;build

&lt;span class="c"&gt;# Configure CMake with CUDA, cuDNN, and TensorRT&lt;/span&gt;
cmake &lt;span class="nt"&gt;-D&lt;/span&gt; &lt;span class="nv"&gt;CMAKE_BUILD_TYPE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;RELEASE &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-D&lt;/span&gt; &lt;span class="nv"&gt;CMAKE_INSTALL_PREFIX&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/usr/local &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-D&lt;/span&gt; &lt;span class="nv"&gt;OPENCV_EXTRA_MODULES_PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;~/opencv_contrib/modules &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-D&lt;/span&gt; &lt;span class="nv"&gt;WITH_CUDA&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ON &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-D&lt;/span&gt; &lt;span class="nv"&gt;CUDA_ARCH_BIN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;8.7 &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-D&lt;/span&gt; &lt;span class="nv"&gt;CUDA_ARCH_PTX&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-D&lt;/span&gt; &lt;span class="nv"&gt;WITH_CUDNN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ON &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-D&lt;/span&gt; &lt;span class="nv"&gt;OPENCV_DNN_CUDA&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ON &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-D&lt;/span&gt; &lt;span class="nv"&gt;ENABLE_FAST_MATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ON &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-D&lt;/span&gt; &lt;span class="nv"&gt;CUDA_FAST_MATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ON &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-D&lt;/span&gt; &lt;span class="nv"&gt;WITH_CUBLAS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ON &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-D&lt;/span&gt; &lt;span class="nv"&gt;WITH_V4L&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ON &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-D&lt;/span&gt; &lt;span class="nv"&gt;WITH_LIBV4L&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ON &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-D&lt;/span&gt; &lt;span class="nv"&gt;WITH_OPENGL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ON &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-D&lt;/span&gt; &lt;span class="nv"&gt;BUILD_OPENCV_PYTHON3&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ON &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-D&lt;/span&gt; &lt;span class="nv"&gt;BUILD_EXAMPLES&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;OFF &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-D&lt;/span&gt; &lt;span class="nv"&gt;BUILD_TESTS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;OFF &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-D&lt;/span&gt; &lt;span class="nv"&gt;BUILD_DOCS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;OFF &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-D&lt;/span&gt; &lt;span class="nv"&gt;BUILD_PERF_TESTS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;OFF &lt;span class="se"&gt;\ &lt;/span&gt;
      &lt;span class="nt"&gt;-D&lt;/span&gt; &lt;span class="nv"&gt;CMAKE_C_COMPILER_LAUNCHER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ccache &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-D&lt;/span&gt; &lt;span class="nv"&gt;CMAKE_CXX_COMPILER_LAUNCHER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ccache ..

&lt;span class="c"&gt;# Compile OpenCV using all CPU cores&lt;/span&gt;
&lt;span class="nv"&gt;TOTAL_CPP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;find ~/opencv ~/opencv_contrib/modules &lt;span class="nt"&gt;-name&lt;/span&gt; &lt;span class="s2"&gt;"*.cpp"&lt;/span&gt; | &lt;span class="nb"&gt;wc&lt;/span&gt; &lt;span class="nt"&gt;-l&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
make &lt;span class="nt"&gt;-j&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;nproc&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt; | pv &lt;span class="nt"&gt;-lep&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;TOTAL_CPP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;# Install OpenCV&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;make &lt;span class="nb"&gt;install
sudo &lt;/span&gt;ldconfig

&lt;span class="c"&gt;# Verify installation&lt;/span&gt;
python3 &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"import cv2; print(cv2.getBuildInformation())"&lt;/span&gt;

&lt;span class="c"&gt;# Ensure Python recognizes the new OpenCV installation&lt;/span&gt;
&lt;span class="nv"&gt;PYTHON_VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;python3 &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"import sys; print('python'+sys.version[:3])"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"export PYTHONPATH=/usr/local/lib/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PYTHON_VERSION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/site-packages:&lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="s2"&gt;PYTHONPATH"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; ~/.bashrc
&lt;span class="nb"&gt;source&lt;/span&gt; ~/.bashrc

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"✅ OpenCV &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OPENCV_VERSION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; built and installed with CUDA!"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here &lt;code&gt;ccache&lt;/code&gt; (Compiler Cache) speeds up compilation by storing previously compiled object files and reusing them when no source code changes occur. This is useful when frequently tweaking settings &lt;em&gt;or rebuilding&lt;/em&gt; OpenCV.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is this all for?
&lt;/h2&gt;

&lt;p&gt;Building OpenCV from source with CUDA support on the Jetson Orin NX is &lt;strong&gt;an essential optimization&lt;/strong&gt; for developers working with real-time image processing and AI applications. By &lt;strong&gt;leveraging the GPU&lt;/strong&gt; for operations such as object detection, background subtraction, and feature extraction, &lt;a href="https://opencv.org/platforms/cuda/" rel="noopener noreferrer"&gt;performance can improve&lt;/a&gt; &lt;strong&gt;by 5-10x&lt;/strong&gt; compared to CPU-only execution.&lt;/p&gt;

&lt;p&gt;This custom-built OpenCV version seamlessly integrates with Python, ensuring that developers can access GPU acceleration without changing their existing OpenCV-based code. Whether you are deploying deep learning models, processing high-resolution video streams, or performing complex computer vision tasks, this optimized OpenCV installation ensures that your Orin NX operates at its maximum potential.&lt;/p&gt;

&lt;p&gt;With this approach and using the provided script, developers can streamline the build process and focus on developing CUDA-powered applications that take full advantage of NVIDIA's cutting-edge hardware.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance gain
&lt;/h2&gt;

&lt;p&gt;Motion tracking and optical flow are ~7-10x faster with CUDA. Moving object detection gets a 9x boost with CUDA. E.g. from ~12 &lt;strong&gt;up to 100 FPS&lt;/strong&gt;. Recalling relatively simple tasks, it is also worth noting that CUDA provides a 5x–10x speed boost for basic image processing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wait, does NVIDIA’s TensorRT or DeepStream need OpenCV?
&lt;/h2&gt;

&lt;p&gt;TensorRT is a library for optimized deep learning inference on GPUs. It &lt;em&gt;does not need OpenCV&lt;/em&gt;, as it processes models (YOLO, ResNet, etc.) directly. You interact with TensorRT using Python (&lt;code&gt;tensorrt&lt;/code&gt; package) or C++. If you need preprocessing (e.g., image resizing, normalization), OpenCV can help but isn’t required.&lt;/p&gt;

&lt;p&gt;DeepStream is a full pipeline for video analytics using TensorRT + GStreamer. If you use DeepStream’s GStreamer-based pipeline, &lt;em&gt;OpenCV is optional&lt;/em&gt;. However, if you’re post-processing model output (e.g., drawing bounding boxes), OpenCV can be useful.&lt;/p&gt;

&lt;p&gt;🤖🔎👀 Wishing you keen machine vision! &lt;/p&gt;

&lt;p&gt;Technical specs sources:&lt;br&gt;
a. &lt;a href="https://developer.nvidia.com/cuda-gpus" rel="noopener noreferrer"&gt;https://developer.nvidia.com/cuda-gpus&lt;/a&gt;&lt;br&gt;
b. &lt;a href="https://developer.download.nvidia.com/assets/embedded/secure/jetson/orin_nx/docs/Jetson_Orin_NX_DS-10712-001_v0.5.pdf" rel="noopener noreferrer"&gt;https://developer.download.nvidia.com/assets/embedded/secure/jetson/orin_nx/docs/Jetson_Orin_NX_DS-10712-001_v0.5.pdf&lt;/a&gt;&lt;br&gt;
c. &lt;a href="https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/" rel="noopener noreferrer"&gt;https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/&lt;/a&gt;&lt;br&gt;
d. &lt;a href="https://opencv.org/platforms/cuda/" rel="noopener noreferrer"&gt;https://opencv.org/platforms/cuda/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>nvidia</category>
      <category>jetson</category>
      <category>cuda</category>
      <category>opencv</category>
    </item>
    <item>
      <title>Finding the Right "Brain" and Software for Civilian Drone. Part 2</title>
      <dc:creator>Roman Belshevitz</dc:creator>
      <pubDate>Tue, 19 Mar 2024 13:51:51 +0000</pubDate>
      <link>https://dev.to/rbelshevitz/finding-the-right-brain-and-software-for-civilian-drone-part-2-4h12</link>
      <guid>https://dev.to/rbelshevitz/finding-the-right-brain-and-software-for-civilian-drone-part-2-4h12</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a continuation. The first part is &lt;a href="https://dev.to/rbalashevich/finding-the-right-brain-and-software-for-civilian-drone-2on4"&gt;here&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;There was a US-based company called Aerotenna about ten years ago that made FPGA-based drone controller platforms. They moved onto specializing in sensors specifically. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://aerotenna.readme.io/docs/unboxing" rel="noopener noreferrer"&gt;The OcPoC-Zynq by Aerotenna&lt;/a&gt; was conceived as a compelling flight controller for those seeking to push the boundaries of drone technology. &lt;/p&gt;

&lt;p&gt;It's compatibility with the open-source PX4 project could provide a robust foundation of reliability. &lt;/p&gt;

&lt;p&gt;Unfortunately, the complexity of the implementation (as it turned out, excessive) did not allow all this to come true.&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical Specifications and Unique Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Hybrid Architecture
&lt;/h3&gt;

&lt;p&gt;The heart of the OcPoC-Zynq is a Zynq System-on-Chip (SoC), combining an ARM processor with a programmable Field-Programmable Gate Array (FPGA). &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fewv2rxrjhhyb9c7mhlkj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fewv2rxrjhhyb9c7mhlkj.png" alt=" " width="623" height="505"&gt;&lt;/a&gt;&lt;br&gt;
&lt;small&gt;&lt;em&gt;The similar Z-turn Board V2 schematic diagram. Pic source: Xilinx&lt;/em&gt;&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;This grants immense flexibility for hardware acceleration and customization.&lt;/p&gt;

&lt;h3&gt;
  
  
  Processing Power
&lt;/h3&gt;

&lt;p&gt;The chosen Zynq SoC typically features a dual-core ARM Cortex-A9 processor and a sizeable FPGA fabric, ensuring both baseline flight control capabilities and room for advanced functionality.&lt;/p&gt;

&lt;h3&gt;
  
  
  Enhanced I/O
&lt;/h3&gt;

&lt;p&gt;The OcPoC-Zynq boasts numerous configurable input/output pins, supporting a multitude of sensor configurations, communication protocols, and potential payloads.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpsb21shnbhjul36uj4ss.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpsb21shnbhjul36uj4ss.png" alt=" " width="600" height="599"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Sensor Redundancy
&lt;/h3&gt;

&lt;p&gt;To maximize reliability, the OcPoC-Zynq declared of support triple redundancy on important sensors like the GPS, IMU, and magnetometer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp5lr8cok7ocjo7qbs7ek.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp5lr8cok7ocjo7qbs7ek.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
&lt;small&gt;&lt;em&gt;The OcPoC-Zynq is a FPGA+ARM SoC based flight control platform mounted on a drone.&lt;/em&gt;&lt;/small&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  A developer-oriented solution
&lt;/h2&gt;

&lt;p&gt;Just replacing an embedded MCU with an FPGA would be a step backward. However, FPGA's are far better suited to perform functions such as high bandwidth digital signal processing. However, these specifications create a platform where possibilities extend beyond traditional flight controllers. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The FPGA emerges as a space for hardware-level innovation. Instead of relying solely on the main processor, computationally demanding elements of the PX4 system, such as ones described below, could be offloaded to the FPGA. This distribution of tasks promised smoother flight performance and potential for more sophisticated features.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The FPGA's adaptability allows the OcPoC-Zynq &lt;a href="https://web.archive.org/web/20231128165753/https://aerotenna.readme.io/docs/ocpoctm-zynq-mini" rel="noopener noreferrer"&gt;to embrace custom sensors and specialized peripherals&lt;/a&gt;, propelling it beyond the limitations of standardized drone builds. This makes it particularly appealing within a research context, where the integration of experimental sensors or novel technologies is key. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Unfortunately, the site with the documentation was removed, some information is available in the cache of the "Web Archive".&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Perhaps most exciting is the potential the OcPoC-Zynq could have for advanced autonomous flight. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwojq8r7fh81kv92zwqyw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwojq8r7fh81kv92zwqyw.png" alt=" " width="800" height="424"&gt;&lt;/a&gt;&lt;br&gt;
&lt;small&gt;&lt;em&gt;Architecture model for a reconfigurable autopilot board. Source: &lt;a href="https://www.mdpi.com/1424-8220/21/4/1115" rel="noopener noreferrer"&gt;Queensland University of Technology&lt;/a&gt;&lt;/em&gt;&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;The FPGA's processing power offers the opportunity to run AI algorithms directly on the drone. This localizes decision-making processes, potentially leading to real-time obstacle avoidance, sophisticated vision-based navigation, and previously unimaginable flight behaviors.&lt;/p&gt;

&lt;p&gt;The OcPoC flight controller isn't just specialized hardware; it runs a full-fledged operating system (i.e Ubuntu &lt;code&gt;armhf&lt;/code&gt;) on its Zynq-7010 ARM processor. This allows it to use standard Linux-based flight control software.&lt;/p&gt;

&lt;p&gt;You still can &lt;a href="https://github.com/Aerotenna/OcPoC_Mini_Zynq_Files" rel="noopener noreferrer"&gt;clone the repository&lt;/a&gt; and customize your kernel to your specific project using Xilinx's &lt;code&gt;linux-xlnx&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Some functions where the FPGA could be used in a drone
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Sensors
&lt;/h3&gt;

&lt;p&gt;High accuracy Kalman Filter/Inertial Measurement Unit (EKF/IMU). This would take input from &lt;a href="https://www.ndigital.com/6dof-explained/" rel="noopener noreferrer"&gt;a 6DOF sensor&lt;/a&gt; and provide tracking in flight.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cameras
&lt;/h3&gt;

&lt;p&gt;Video processing from multiple cameras. A single DSP is adequate for one camera input. But, multiple cameras would require an FPGA due to the high bandwidth requirement. So, you could do &lt;a href="https://github.com/Aerotenna/OcPoC_Mini_Zynq_Files/blob/3f03e50e0933fdf9934c262014f87c6c7c17a8e5/Kernel_Config/ocpoc_defconfig#L2239" rel="noopener noreferrer"&gt;some stereo-vision&lt;/a&gt; and &lt;a href="https://www.irjet.net/archives/V7/i1/IRJET-V7I189.pdf" rel="noopener noreferrer"&gt;🗎 object recognition&lt;/a&gt; using the FPGA.&lt;/p&gt;

&lt;h2&gt;
  
  
  The further fate of the solution
&lt;/h2&gt;

&lt;p&gt;It's important to note that unlocking the full potential of the OcPoC-Zynq requires a degree of development expertise. Customizing the FPGA involves  hardware design skills, and careful integration with PX4 software might be necessary to fully realize the benefits of hardware acceleration. &lt;/p&gt;

&lt;p&gt;Nonetheless, the Aerotenna OcPoC-Zynq served as an interesting exceptional platform for researchers and innovators eager to shape the future of drone technology within the dynamic PX4 landscape.&lt;/p&gt;

&lt;p&gt;Unfortunately, there is no news today about the development of this project. The project has been discontinued and is no longer commercially available.&lt;/p&gt;

&lt;p&gt;PX4 v1.11 is the last release that has experimental support for this platform.&lt;/p&gt;

&lt;p&gt;So there are certainly use cases for FPGA on flight controllers, and the author reviewed one of them here. &lt;/p&gt;

&lt;p&gt;Performing checksums on the sensors data interface, digital filters and cameras signal processing - these are small tasks that could be outsourced to the FPGA, since they are expensive on the CPU.&lt;/p&gt;

&lt;p&gt;Yet, flying &lt;em&gt;is still a relatively slow process&lt;/em&gt; with little data, so regular ARM CPUs can keep up. Compared to high speed video processing.&lt;/p&gt;

&lt;p&gt;Also, development for an FPGA is more expensive. The parts are pricey, high pin count thus the boards are expensive (&amp;gt; 2 layers) and the design software is expensive. Overall, development costs more time, and time is money.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Moreover, most modern ARM MCUs offer an attractively good balance of performance and affordability. They are readily available, well-supported, and cost less than FPGA-based solutions.&lt;/p&gt;

&lt;p&gt;In addition, over the decade, a large number of dedicated video transmission and processing modules have appeared on the market, taking on these tasks entirely.&lt;/p&gt;

&lt;p&gt;Thus, we can say that highly integrated ARM MCU solutions have gained the upper hand today, at least in the civil mass drone segment.&lt;/p&gt;

</description>
      <category>embedded</category>
      <category>drones</category>
      <category>zynq</category>
    </item>
    <item>
      <title>Finding the Right "Brain" and Software for Civilian Drone. Part 1</title>
      <dc:creator>Roman Belshevitz</dc:creator>
      <pubDate>Fri, 15 Mar 2024 20:36:52 +0000</pubDate>
      <link>https://dev.to/rbelshevitz/finding-the-right-brain-and-software-for-civilian-drone-2on4</link>
      <guid>https://dev.to/rbelshevitz/finding-the-right-brain-and-software-for-civilian-drone-2on4</guid>
      <description>&lt;p&gt;&lt;em&gt;This is like a beginning. See part 2 &lt;a href="https://dev.to/rbalashevich/finding-the-right-brain-and-software-for-civilian-drone-part-2-4h12"&gt;here&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Unmanned Aerial Vehicles (UAVs), colloquially known as drones, have revolutionized industries ranging from agriculture and construction to surveillance and logistics. &lt;/p&gt;

&lt;p&gt;This transformative potential hinges on the intricate interplay between hardware components and software systems. At the heart of this symbiotic relationship lies the Microcontroller Unit (MCU), an indispensable component orchestrating the UAV's operations. However, the efficacy of UAVs is contingent upon the selection of an appropriate MCU and the deployment of a robust operating system tailored to their unique operational exigencies.&lt;/p&gt;

&lt;h2&gt;
  
  
  Specialized Operating Requirements of UAVs
&lt;/h2&gt;

&lt;p&gt;UAVs operate in dynamic and resource-constrained environments, necessitating operating systems that diverge from traditional paradigms. Unlike desktop computers or laptops, UAVs prioritize power efficiency to maximize flight time, embodying stringent constraints on memory and processing capabilities. Real-time responsiveness assumes paramount importance to ensure flight stability, thereby demanding operating systems that offer deterministic timing and swift reaction to dynamic environmental stimuli.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-Time Operating Systems: A Tailored Solution
&lt;/h2&gt;

&lt;p&gt;Real-Time Operating Systems (RTOS) have emerged as a tailored solution to address the specialized operational requirements of UAVs. By offering deterministic timing and efficient resource utilization, RTOSes facilitate the seamless execution of critical tasks such as flight control and sensor management. Popular RTOS choices, including &lt;a href="https://www.st.com/content/st_com/en/support/learning/stm32-education/stm32-moocs/freertos-common-microcontroller-software-interface-standard-osv2.html" rel="noopener noreferrer"&gt;FreeRTOS&lt;/a&gt;, &lt;a href="https://hmchung.gitbooks.io/stm32-tutorials/content/nuttx-installation.html" rel="noopener noreferrer"&gt;NuttX&lt;/a&gt;, &lt;a href="https://www.chibios.org/dokuwiki/doku.php?id=chibios:documentation:start" rel="noopener noreferrer"&gt;ChibiOS&lt;/a&gt;, &lt;a href="https://docs.zephyrproject.org/latest/boards/index.html" rel="noopener noreferrer"&gt;Zephyr&lt;/a&gt; and &lt;a href="https://www.rt-thread.io/board.html" rel="noopener noreferrer"&gt;RT-Thread&lt;/a&gt; offer varying strengths in terms of size, security, and hardware support, catering to diverse UAV projects.&lt;/p&gt;

&lt;h2&gt;
  
  
  STM32: A Preferred MCU for UAV Development
&lt;/h2&gt;

&lt;p&gt;The STM32 family of MCUs has garnered widespread acclaim in the realm of UAV development due to its versatility and performance capabilities. With a plethora of options catering to diverse application scenarios, STM32 MCUs offer seamless integration with various RTOS options, thereby enhancing flexibility and scalability in UAV development endeavors.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hardware Abstraction Layers: Facilitating Portability and Focus
&lt;/h2&gt;

&lt;p&gt;Hardware Abstraction Layers (HALs) play a pivotal role in bridging the firmware and hardware components of UAV systems. By encapsulating low-level hardware details and providing a standardized interface, HALs facilitate portability, allowing firmware such as Ardupilot to run seamlessly across different MCUs and RTOSes. Furthermore, HALs enable firmware developers to focus on high-level tasks such as navigation and sensor fusion, thereby enhancing development efficiency and code maintainability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vendors' Role in HAL Provisioning
&lt;/h2&gt;

&lt;p&gt;Semiconductor vendors, exemplified by STMicroelectronics for the STM32 MCUs family, play a crucial role in providing and maintaining Hardware Abstraction Layers (HALs). One of the pivotal reasons for the widespread adoption of STM32 microcontrollers in civilian drone applications is &lt;a href="https://www.st.com/resource/en/user_manual/dm00173145-description-of-stm32l4-l4-hal-and-low-layer-drivers-stmicroelectronics.pdf" rel="noopener noreferrer"&gt;🗎 their robust support&lt;/a&gt; for various communication buses and protocols, which are essential for facilitating seamless integration with a wide range of peripheral devices and external sensors.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.st.com/en/microcontrollers-microprocessors/stm32-32-bit-arm-cortex-mcus.html" rel="noopener noreferrer"&gt;STM32 MCUs&lt;/a&gt; boast extensive support for industry-standard communication interfaces, including Universal Asynchronous Receiver-Transmitter (UART), Serial Peripheral Interface (SPI), Inter-Integrated Circuit (I2C), Controller Area Network (CAN), and USB. These communication interfaces enable UAV developers to establish reliable and high-speed data exchange with external components, such as GPS modules, inertial measurement units (IMUs), cameras, and telemetry systems.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhz7cev9kx1bqrdgyjo2z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhz7cev9kx1bqrdgyjo2z.png" alt=" " width="800" height="835"&gt;&lt;/a&gt;&lt;br&gt;
&lt;small&gt;&lt;em&gt;Pic source: st.com&lt;/em&gt;&lt;/small&gt; 🔎&lt;/p&gt;

&lt;p&gt;Furthermore, STM32 microcontrollers feature built-in support for popular communication protocols commonly used in UAV applications. This includes protocols such as MAVLink, which facilitates communication between onboard flight controllers and ground control stations, as well as protocols like I2C and SPI for interfacing with peripheral sensors and devices.&lt;/p&gt;

&lt;p&gt;By offering comprehensive support for communication buses and protocols, STMicroelectronics empowers drone-makers to build highly interconnected and interoperable UAV systems. This enables seamless integration of diverse sensor arrays, payload systems, and communication modules, thereby enhancing the functionality, versatility, and performance of civilian drones. &lt;/p&gt;

&lt;p&gt;With STM32 MCUs as the backbone of UAV development, drone-makers can involve a rich ecosystem of communication interfaces and protocols to realize their vision of advanced and mission-critical drone applications.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Zynq UltraScale+ platform, developed by Xilinx (now AMD)
&lt;/h2&gt;

&lt;p&gt;This solution offers several distinct advantages for drone making, particularly in scenarios that demand high computational power, flexibility, and integration of custom hardware accelerators. Below are some key advantages of the Zynq platform for drone development:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.xilinx.com/products/silicon-devices/soc/zynq-ultrascale-mpsoc.html" rel="noopener noreferrer"&gt;Zynq UltraScale+&lt;/a&gt; combines the processing capabilities of Cortex-A53 application processors with the programmable logic of FPGA (Field-Programmable Gate Array) fabric on a single chip. This hybrid architecture &lt;a href="https://linuxgizmos.com/drone-controller-board-runs-linux-on-zynq-ultrascale/" rel="noopener noreferrer"&gt;enables developers to implement custom hardware accelerators&lt;/a&gt; in the FPGA fabric to offload computationally intensive tasks from the CPU, thereby enhancing overall system performance and efficiency.&lt;/p&gt;

&lt;p&gt;Being built on the industry success of the Zynq 7000 SoC family, the UltraScale+ MPSoC architecture extends AMD SoCs to enable true heterogeneous multi-processing with ‘the right engines for the right tasks’ for smarter systems&lt;/p&gt;

&lt;p&gt;The ARM Cortex-A53 processors integrated into the Zynq UltraScale+ provide significant computational power, enabling drones to execute complex algorithms for tasks such as image processing, computer vision, sensor fusion, and autonomous navigation. This high computational capability is crucial for enabling advanced drone functionalities, including obstacle detection and avoidance, object tracking, and environmental mapping.&lt;/p&gt;

&lt;p&gt;The programmable logic fabric of the Zynq platform allows developers to implement custom hardware accelerators tailored to specific drone applications. The flexibility of FPGAs allows developers to customize algorithms and processing pipelines, unlocking unique capabilities for specialized drone missions. &lt;/p&gt;

&lt;p&gt;Additionally, the reconfigurability of FPGAs enables rapid prototyping and iteration, facilitating agile development processes in the fast-paced UAV industry.&lt;/p&gt;

&lt;p&gt;The Zynq platform offers low-latency processing capabilities, making it suitable for real-time applications in drone control and navigation. By deploying critical control and decision-making algorithms on the ARM processors and offloading latency-sensitive tasks to the FPGA fabric, developers can achieve high responsiveness and stability in drone flight operations.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9a6ina95ffsianoufz9q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9a6ina95ffsianoufz9q.png" alt=" " width="800" height="677"&gt;&lt;/a&gt;&lt;br&gt;
&lt;small&gt;&lt;em&gt;Pic source: &lt;a href="https://linuxgizmos.com/drone-controller-board-runs-linux-on-zynq-ultrascale/" rel="noopener noreferrer"&gt;linuxgizmos.com&lt;/a&gt;&lt;/em&gt;&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;The platform features a rich set of peripheral interfaces, including GPIOs, UARTs, SPI, I2C, USB and MIPI facilitating seamless integration with a wide range of sensors, actuators, communication modules, and external devices. This extensive peripheral support simplifies the development of fully integrated drone systems and enables interoperability with existing UAV components and standards.&lt;/p&gt;

&lt;p&gt;The Zynq platform is available in various configurations with different processing capabilities and FPGA resources, allowing developers to choose the optimal combination of performance and cost for their drone applications. Whether designing lightweight drones for aerial photography or high-end UAVs for surveillance and reconnaissance missions, developers can select the Zynq device that best aligns with their performance requirements and budget constraints.&lt;/p&gt;

&lt;p&gt;Xilinx (the maker of Zynq UltraScale+ devices) &lt;a href="https://docs.amd.com/r/en-US/oslib_rm" rel="noopener noreferrer"&gt;provides a comprehensive HAL&lt;/a&gt; as part of their software development tools and frameworks. The primary HAL access point is usually through the Xilinx SDK. It includes libraries and drivers for the various hardware blocks within Zynq UltraScale+ SoCs, including the ARM processor cores, programmable logic, and peripherals.&lt;/p&gt;

&lt;p&gt;A talented Javanese engineer &lt;a href="https://github.com/ikwzm/ZynqMP-FPGA-Linux" rel="noopener noreferrer"&gt;Ichiro Kawazome has composed the repository&lt;/a&gt; which provides a Linux boot image (U-boot bootloader, kernel, RootFS) for Zynq UltraScale+ MPSoC.&lt;/p&gt;

&lt;h2&gt;
  
  
  Target Applications
&lt;/h2&gt;

&lt;p&gt;STM32 microcontrollers are ideal for lightweight drones, flight controllers, and embedded systems where power efficiency, real-time responsiveness, and cost-effectiveness are paramount.&lt;/p&gt;

&lt;p&gt;Zynq is suitable for drone applications that require high computational power, flexibility, and customization, such as autonomous navigation, advanced image processing, and real-time decision-making.&lt;/p&gt;

&lt;p&gt;The choice between Zynq and STM32 depends on the specific requirements of the drone application, including computational complexity, power efficiency, real-time performance, and system integration needs. While STM32 MCUs excel in low-power operation and real-time responsiveness, making them suitable for a wide range of embedded applications, including drones, Zynq offers higher computational power and hardware customization capabilities.  &lt;/p&gt;

&lt;p&gt;Zynq offers significantly higher computational power compared to STM32 microcontrollers, thanks to its ARM application processors and FPGA fabric.&lt;/p&gt;

&lt;p&gt;Developers can create customized Linux distributions tailored for Zynq-based drones, integrating necessary drivers, libraries, and applications for drone control, sensor interfacing, and communication. This approach offers flexibility and control over the software stack but requires significant development effort.&lt;/p&gt;

&lt;h2&gt;
  
  
  Delineating OS and Firmware: A Conceptual Framework
&lt;/h2&gt;

&lt;p&gt;In the architectural hierarchy of UAV systems, the RTOS serves as the foundational layer, orchestrating task scheduling, resource allocation, and timing management. RTOSes can be remarkably small, especially ones designed for the strict memory constraints of embedded systems.&lt;/p&gt;

&lt;p&gt;The HAL, influenced by the MCU vendor, provides a standardized interface for firmware interaction, encapsulating hardware intricacies and enhancing portability.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdlszwhh2apoijjwkaljj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdlszwhh2apoijjwkaljj.png" alt=" " width="704" height="453"&gt;&lt;/a&gt;&lt;br&gt;
&lt;small&gt;&lt;em&gt;The most common structure of the existing flight controller firmware, which consists of layers and modules. Pic source: &lt;a href="https://www.mdpi.com/2226-4310/9/2/62" rel="noopener noreferrer"&gt;Gyeongsang National University, Rep. of Korea&lt;/a&gt;&lt;/em&gt;&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;Firmware, typified by Ardupilot or PX4, operates at a higher level, focusing on flight control, navigation, and hardware interfacing, thereby delineating the boundaries between OS and application-specific functionalities.&lt;/p&gt;

&lt;p&gt;ChibiOS is hidden under the hood of Ardupilot, while what's in the same role for PX4 is NuttX.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft2mwi7bihmfaaz1s5wjk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft2mwi7bihmfaaz1s5wjk.png" alt=" " width="800" height="718"&gt;&lt;/a&gt;&lt;br&gt;
&lt;small&gt;&lt;em&gt;A  high level overview of a typical "simple" PX4 system based around a flight controller. Pic source: &lt;a href="https://docs.px4.io/main/en/concept/px4_systems_architecture.html" rel="noopener noreferrer"&gt;docs.px4.io&lt;/a&gt;&lt;/em&gt;&lt;/small&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Is There a Place for Linux Above the Ground?
&lt;/h2&gt;

&lt;p&gt;While Real-Time Operating Systems (RTOS) like FreeRTOS, NuttX, and others have traditionally dominated the realm of civilian drone development due to their suitability for real-time responsiveness and resource optimization, the use of Linux-based platforms, particularly the Zynq platform, has also gained traction in certain niche applications within the UAV ecosystem.&lt;/p&gt;

&lt;p&gt;Linux offers several advantages for civilian drone applications, especially in scenarios where complex computational tasks, such as image processing, data analysis, or machine learning, are required. The Zynq platform, which combines the flexibility and ease of use of an ARM processor with the programmable logic capabilities of an FPGA, presents an attractive option for UAV developers looking to leverage the power of Linux in conjunction with customizable hardware acceleration.&lt;/p&gt;

&lt;p&gt;One area where Linux-based platforms like Zynq have found utility is in high-level mission planning, data processing, and decision-making tasks. For instance, Linux enables developers to deploy sophisticated algorithms for autonomous navigation, obstacle avoidance, and environmental mapping, leveraging the extensive software libraries and development tools available within the Linux ecosystem.&lt;/p&gt;

&lt;p&gt;Additionally, Linux-based platforms offer robust networking capabilities, allowing drones to communicate with ground control stations, cloud services, and other drones in a distributed manner. This facilitates collaborative missions, swarm behavior, and real-time data sharing, thereby expanding the scope of civilian drone applications beyond individual flight operations.&lt;/p&gt;

&lt;p&gt;However, it's essential to acknowledge that Linux-based solutions may not be suitable for all drone applications, particularly those that prioritize real-time responsiveness, low latency, and deterministic behavior. In such cases, RTOSes remain the preferred choice due to their ability to guarantee timing constraints and optimize resource utilization.&lt;/p&gt;

&lt;p&gt;Saying briefly, while RTOSes continue to dominate the civilian drone landscape, Linux-based platforms like Zynq offer a compelling alternative for certain niche applications that require advanced computational capabilities, networking features, and flexibility. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;As the UAV industry evolves and the demand for more sophisticated autonomous capabilities grows, the integration of Linux-based solutions alongside traditional RTOSes is likely to become increasingly prevalent, driving innovation and expanding the horizons of civilian drone technology.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Spirit of Open-Source
&lt;/h2&gt;

&lt;p&gt;The following projects have become very significant within the discussed area of development and have gained a wide community over the past ten years:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ardupilot.org/" rel="noopener noreferrer"&gt;ArduPilot&lt;/a&gt; is a widely used open-source autopilot system for UAVs. It offers similar features to PX4, including flight control, mission planning, and telemetry support. ArduPilot supports a variety of airframes and can run on microcontroller-based flight controllers like Arduino boards.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://px4.io/" rel="noopener noreferrer"&gt;PX4 Autopilot&lt;/a&gt; is another popular open-source autopilot system for UAVs. It provides a complete set of flight control algorithms, mission planning tools, and communication protocols. PX4 supports a wide range of airframes and is highly customizable, making it suitable for both hobbyist and commercial UAV applications.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://mavlink.io/en/" rel="noopener noreferrer"&gt;MAVLink&lt;/a&gt; is an open-source communication protocol used for exchanging telemetry and control messages between UAVs and ground control stations. It provides a lightweight, efficient messaging format and supports various transport protocols, including serial, UDP, and TCP/IP.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://mavsdk.mavlink.io/main/en/index.html" rel="noopener noreferrer"&gt;MAVSDK&lt;/a&gt; (MAVLink Software Development Kit) is a cross-platform, open-source SDK for accessing MAVLink-based UAV systems programmatically. It provides APIs in multiple programming languages, including C++, Python, and Swift, making it suitable for a wide range of UAV applications.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://opencv.org/" rel="noopener noreferrer"&gt;OpenCV&lt;/a&gt; (Open Source Computer Vision Library) is a popular open-source computer vision library that is widely used in UAV applications for tasks such as object detection, tracking, and image processing. It provides a comprehensive set of algorithms and tools for working with visual data. The reader definitely have to take a look at &lt;a href="https://github.com/Kenil16/master_project" rel="noopener noreferrer"&gt;Kenni Nilsson's project&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://wiki.ros.org/" rel="noopener noreferrer"&gt;ROS&lt;/a&gt; (Robot Operating System) is an open-source robotics middleware framework that provides libraries and tools for building robotic systems. It includes packages for tasks such as sensor integration, localization, mapping, and path planning, &lt;a href="https://robots.ros.org/category/aerial/" rel="noopener noreferrer"&gt;which are applicable to UAVs as well&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://librepilot.atlassian.net/wiki/spaces/LPDOC/pages/2818105/Welcome" rel="noopener noreferrer"&gt;LibrePilot&lt;/a&gt;, an open-source flight control software for UAVs that offers a range of features including stabilization, navigation, and telemetry. It is designed to work with a variety of hardware platforms and offers extensive configurability and customization options.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.expresslrs.org/" rel="noopener noreferrer"&gt;ExpressLRS (ELRS)&lt;/a&gt; is a rapidly growing open-source radio control link designed to surpass the performance limitations of traditional systems. It prioritizes long range, low latency, and high update rates for applications like FPV drone racing and long-range aircraft. Users have extensive control over the firmware, allowing them to fine-tune parameters like transmission power, frequency bands, telemetry options, and much more. This level of customization caters to a variety of use cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Future Trajectory of Drone Software
&lt;/h2&gt;

&lt;p&gt;The trajectory of drone software is poised for significant advancements, propelled by burgeoning research endeavors in secure software engineering, artificial intelligence (AI), and programming language design. &lt;/p&gt;

&lt;p&gt;Anticipated advancements encompass the integration of AI algorithms for onboard decision-making, adoption of safer programming languages to mitigate system vulnerabilities, and enhancements in software security mechanisms to fortify UAV resilience against adversarial threats.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In conclusion, the selection of an appropriate MCU or SoC and the deployment of a tailored operating system constitute foundational pillars underpinning the efficacy and reliability of UAV systems. &lt;/p&gt;

&lt;p&gt;By embracing Real-Time Operating Systems (RTOS) and leveraging Hardware Abstraction Layers (HALs), developers can navigate the intricate landscape of UAV software development with confidence, ushering in a new era of innovation and transformative potential across diverse industry verticals.&lt;/p&gt;

&lt;p&gt;It is also difficult to overestimate the contribution of the open source community in popularizing drone-centric development!&lt;/p&gt;

&lt;p&gt;Soft landings and clear skies to everyone!&lt;/p&gt;

&lt;p&gt;&lt;small&gt;&lt;em&gt;Most sources are quite fully identified with links given inside the article.&lt;br&gt;Cover pic by Marian A. Juwan, Pixabay.&lt;/em&gt;&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;&lt;em&gt;See also: &lt;a href="https://www.digikey.co.il/en/maker/projects/getting-started-with-stm32-introduction-to-freertos/ad275395687e4d85935351e16ec575b1" rel="noopener noreferrer"&gt;'Getting Started with STM32 - Introduction to FreeRTOS'&lt;/a&gt;, a blog post by Shawn Hymel @ maker.io / DigiKey.&lt;/em&gt;&lt;/small&gt;&lt;/p&gt;

</description>
      <category>embedded</category>
      <category>drones</category>
      <category>stm32</category>
      <category>zynq</category>
    </item>
    <item>
      <title>32 Kernel’s Teeth for “Chewing” the Network Stack on Linux</title>
      <dc:creator>Roman Belshevitz</dc:creator>
      <pubDate>Mon, 29 May 2023 17:02:03 +0000</pubDate>
      <link>https://dev.to/rbelshevitz/32-kernels-teeth-for-chewing-the-network-stack-on-linux-lnc</link>
      <guid>https://dev.to/rbelshevitz/32-kernels-teeth-for-chewing-the-network-stack-on-linux-lnc</guid>
      <description>&lt;p&gt;The topic of tuning the network stack is very narrow and complex the same time. Today, many tips either do not match the current default settings. Some mechanisms are already included in modern kernels. Below is my compilation of what seems to be relevant at the moment. It's mostly about timeouts and memory consumption. There are a lot of them now and they are cheap.&lt;/p&gt;

&lt;p&gt;In this article I provide a set of recommended network configuration settings for optimizing TCP connections on a server. The suggested settings include adjusting parameters related to orphaned TCP sockets, reducing the timeout for sockets in the &lt;code&gt;FIN-WAIT-2&lt;/code&gt; state, configuring TCP keepalive checks, managing memory allocation for TCP connections, disabling syncookies, selecting a congestion control algorithm, expanding the local port range, enabling protection against &lt;code&gt;TIME_WAIT&lt;/code&gt; attacks, increasing the maximum number of open sockets, adjusting buffer sizes for connections, and optionally disabling local ICMP packet redirects. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;These discussed settings aim to improve performance, memory usage, and security on powerful and busy servers.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  1
&lt;/h3&gt;

&lt;p&gt;⚙️ Increase the value of &lt;code&gt;tcp_max_orphans&lt;/code&gt;, which determines the maximum number of orphaned (not associated with any process) TCP sockets. Each socket consumes approximately &lt;code&gt;64 KB&lt;/code&gt; of memory. Therefore, the parameter should be matched with the available memory on the server.&lt;br&gt;
&lt;code&gt;net.ipv4.tcp_max_orphans = 65536&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2
&lt;/h3&gt;

&lt;p&gt;⚙️ Decrease &lt;code&gt;tcp_fin_timeout&lt;/code&gt; (default is &lt;code&gt;60&lt;/code&gt;). This parameter determines the maximum time a socket can remain in the &lt;code&gt;FIN-WAIT-2&lt;/code&gt; state. This state is used when the other party does not close the connection. Each socket occupies about &lt;code&gt;1.5 KB&lt;/code&gt; of memory, which can consume memory when there are many of them.&lt;br&gt;
&lt;code&gt;net.ipv4.tcp_fin_timeout = 10&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3
&lt;/h3&gt;

&lt;p&gt;⚙️ Parameters related to TCP connection checks in the &lt;code&gt;SO_KEEPALIVE&lt;/code&gt; status: &lt;code&gt;keepalive_time&lt;/code&gt; specifies the time after which checks will begin after the last activity on the connection, &lt;code&gt;keepalive_intvl&lt;/code&gt; determines the interval between checks, and &lt;code&gt;keepalive_probes&lt;/code&gt; specifies the number of checks.&lt;br&gt;
&lt;code&gt;net.ipv4.tcp_keepalive_time = 1800&lt;/code&gt;&lt;br&gt;
&lt;code&gt;net.ipv4.tcp_keepalive_intvl = 15&lt;/code&gt;&lt;br&gt;
&lt;code&gt;net.ipv4.tcp_keepalive_probes = 5&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  4
&lt;/h3&gt;

&lt;p&gt;⚙️ Pay attention to the parameters &lt;code&gt;net.ipv4.tcp_mem&lt;/code&gt;, &lt;code&gt;net.ipv4.tcp_rmem&lt;/code&gt;, and &lt;code&gt;net.ipv4.tcp_wmem&lt;/code&gt;. They heavily depend on the memory available on the server and are automatically calculated during system load. In general, it is not necessary to modify them, but sometimes it is possible to manually adjust them to increase the values.&lt;/p&gt;

&lt;h3&gt;
  
  
  5
&lt;/h3&gt;

&lt;p&gt;⚙️ Disable (enabled by default) the transmission of syncookies to the host in case of &lt;code&gt;SYN&lt;/code&gt; packet queue overflow for a specific socket.&lt;br&gt;
&lt;code&gt;net.ipv4.tcp_syncookies = 0&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  6
&lt;/h3&gt;

&lt;p&gt;⚙️ Special attention should be given to the congestion control algorithm used in TCP networks. There are many algorithms (&lt;code&gt;cubic&lt;/code&gt;, &lt;code&gt;htcp&lt;/code&gt;, &lt;code&gt;bic&lt;/code&gt;, &lt;code&gt;westwood&lt;/code&gt;, etc.), and it is difficult to definitively say which one is better to use. Algorithms show different results in different load scenarios. The kernel parameter tcp_congestion_control controls this:&lt;br&gt;
&lt;code&gt;net.ipv4.tcp_congestion_control = cubic&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  7
&lt;/h3&gt;

&lt;p&gt;⚙️ When the server has a large number of outbound connections, there may not be enough local ports for them. By default, the range &lt;code&gt;32768-60999&lt;/code&gt; is used. It can be expanded:&lt;br&gt;
&lt;code&gt;net.ipv4.ip_local_port_range = 10240 65535&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  8
&lt;/h3&gt;

&lt;p&gt;⚙️ Enable protection against &lt;code&gt;TIME_WAIT&lt;/code&gt; attacks. By default, it is disabled.&lt;br&gt;
&lt;code&gt;net.ipv4.tcp_rfc1337 = 1&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  9
&lt;/h3&gt;

&lt;p&gt;⚙️ The maximum number of open sockets waiting for connections has a relatively low default value. In kernels prior to 5.3, it is &lt;code&gt;128&lt;/code&gt;, which was increased to &lt;code&gt;4096&lt;/code&gt; in kernel 5.4. It makes sense to increase it on busy and powerful servers:&lt;br&gt;
&lt;code&gt;net.core.somaxconn = 16384&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  10
&lt;/h3&gt;

&lt;p&gt;⚙️ On powerful and busy servers, you can increase the default buffer size values for both receiving and transmitting for all connections. This parameter is measured in bytes. By default, it is &lt;code&gt;212992&lt;/code&gt; or &lt;code&gt;208 KB&lt;/code&gt;.&lt;br&gt;
&lt;code&gt;net.core.rmem_default = 851968&lt;/code&gt;&lt;br&gt;
&lt;code&gt;net.core.wmem_default = 851968&lt;/code&gt;&lt;br&gt;
&lt;code&gt;net.core.rmem_max = 12582912&lt;/code&gt;&lt;br&gt;
&lt;code&gt;net.core.wmem_max = 12582912&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  11
&lt;/h3&gt;

&lt;p&gt;⚙️ Disable local ICMP packet redirects. This should only be done if your server does not act as a router, i.e., if you have a regular web server.&lt;br&gt;
&lt;code&gt;net.ipv4.conf.all.accept_redirects = 0&lt;/code&gt;&lt;br&gt;
&lt;code&gt;net.ipv4.conf.all.secure_redirects = 0&lt;/code&gt;&lt;br&gt;
&lt;code&gt;net.ipv4.conf.all.send_redirects = 0&lt;/code&gt;&lt;br&gt;
Additionally, you can completely disable kernel-level responses to ICMP requests. It is not a common practice.&lt;/p&gt;

&lt;h3&gt;
  
  
  12
&lt;/h3&gt;

&lt;p&gt;Increase the value of &lt;code&gt;tcp_max_orphans&lt;/code&gt;, which determines the maximum number of orphaned (not associated with any process) TCP sockets. Each socket consumes approximately &lt;code&gt;64 KB&lt;/code&gt; of memory. Therefore, the parameter should be matched with the available memory on the server.&lt;br&gt;
&lt;code&gt;net.ipv4.tcp_max_orphans = 65536&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  13
&lt;/h3&gt;

&lt;p&gt;⚙️ Decrease &lt;code&gt;tcp_fin_timeout&lt;/code&gt; (default is &lt;code&gt;60&lt;/code&gt;). This parameter determines the maximum time a socket can remain in the &lt;code&gt;FIN-WAIT-2&lt;/code&gt; state. This state is used when the other party does not close the connection. Each socket occupies about &lt;code&gt;1.5 KB&lt;/code&gt; of memory, which can consume memory when there are many of them.&lt;br&gt;
&lt;code&gt;net.ipv4.tcp_fin_timeout = 10&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  14
&lt;/h3&gt;

&lt;p&gt;⚙️ Parameters related to TCP connection checks in the &lt;code&gt;SO_KEEPALIVE&lt;/code&gt; status: keepalive_time specifies the time after which checks will begin after the last activity on the connection, keepalive_intvl determines the interval between checks, and keepalive_probes specifies the number of checks.&lt;br&gt;
&lt;code&gt;net.ipv4.tcp_keepalive_time = 1800&lt;/code&gt;&lt;br&gt;
&lt;code&gt;net.ipv4.tcp_keepalive_intvl = 15&lt;/code&gt;&lt;br&gt;
&lt;code&gt;net.ipv4.tcp_keepalive_probes = 5&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  15
&lt;/h3&gt;

&lt;p&gt;⚙️ Pay attention to the parameters &lt;code&gt;net.ipv4.tcp_mem&lt;/code&gt;, &lt;code&gt;net.ipv4.tcp_rmem&lt;/code&gt;, and &lt;code&gt;net.ipv4.tcp_wmem&lt;/code&gt;. They heavily depend on the memory available on the server and are automatically calculated during system load. In general, it is not necessary to modify them, but sometimes it is possible to manually adjust them to increase the values.&lt;/p&gt;

&lt;h3&gt;
  
  
  16
&lt;/h3&gt;

&lt;p&gt;⚙️ Disable (enabled by default) the transmission of syncookies to the host in case of &lt;code&gt;SYN&lt;/code&gt; packet queue overflow for a specific socket.&lt;br&gt;
&lt;code&gt;net.ipv4.tcp_syncookies = 0&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  17
&lt;/h3&gt;

&lt;p&gt;⚙️ Special attention should be given to the congestion control algorithm used in TCP networks. There are many algorithms (cubic, htcp, bic, westwood, etc.), and it is difficult to definitively say which one is better to use. Algorithms show different results in different load scenarios. The kernel parameter tcp_congestion_control controls this:&lt;br&gt;
&lt;code&gt;net.ipv4.tcp_congestion_control = cubic&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  18
&lt;/h3&gt;

&lt;p&gt;⚙️ When the server has a large number of outbound connections, there may not be enough local ports for them. By default, the range &lt;code&gt;32768-60999&lt;/code&gt; is used. It can be expanded:&lt;br&gt;
&lt;code&gt;net.ipv4.ip_local_port_range = 10240 65535&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  19
&lt;/h3&gt;

&lt;p&gt;⚙️ Enable protection against &lt;code&gt;TIME_WAIT&lt;/code&gt; attacks. By default, it is disabled.&lt;br&gt;
&lt;code&gt;net.ipv4.tcp_rfc1337 = 1&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  20
&lt;/h3&gt;

&lt;p&gt;⚙️ The maximum number of open sockets waiting for connections has a relatively low default value. In kernels prior to 5.3, it is &lt;code&gt;128&lt;/code&gt;, which was increased to 4096 in kernel 5.4. It makes sense to increase it on busy and powerful servers:&lt;br&gt;
net.core.somaxconn = 16384&lt;/p&gt;

&lt;h3&gt;
  
  
  21
&lt;/h3&gt;

&lt;p&gt;⚙️ On powerful and busy servers, you can increase the default buffer size values for both receiving and transmitting for all connections. This parameter is measured in bytes. By default, it is &lt;code&gt;212992&lt;/code&gt; or &lt;code&gt;208 KB&lt;/code&gt;.&lt;br&gt;
&lt;code&gt;net.core.rmem_default = 851968&lt;/code&gt;&lt;br&gt;
&lt;code&gt;net.core.wmem_default = 851968&lt;/code&gt;&lt;br&gt;
&lt;code&gt;net.core.rmem_max = 12582912&lt;/code&gt;&lt;br&gt;
&lt;code&gt;net.core.wmem_max = 12582912&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  22
&lt;/h3&gt;

&lt;p&gt;⚙️ Disable local ICMP packet redirects. This should only be done if your server does not act as a router, i.e., if you have a regular web server.&lt;br&gt;
&lt;code&gt;net.ipv4.conf.all.accept_redirects = 0&lt;/code&gt;&lt;br&gt;
&lt;code&gt;net.ipv4.conf.all.secure_redirects = 0&lt;/code&gt;&lt;br&gt;
&lt;code&gt;net.ipv4.conf.all.send_redirects = 0&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Additionally, you can completely disable kernel-level responses to ICMP requests. I usually don't Increase the value of &lt;code&gt;tcp_max_orphans&lt;/code&gt;, which determines the maximum number of orphaned (not associated with any process) TCP sockets. &lt;/p&gt;

&lt;p&gt;Each socket consumes approximately &lt;code&gt;64 KB&lt;/code&gt; of memory. Therefore, the parameter should be matched with the available memory on the server.&lt;br&gt;
&lt;code&gt;net.ipv4.tcp_max_orphans = 65536&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  23
&lt;/h3&gt;

&lt;p&gt;⚙️ Decrease tcp_fin_timeout (default is &lt;code&gt;60&lt;/code&gt;). This parameter determines the maximum time a socket can remain in the FIN-WAIT-2 state. This state is used when the other party does not close the connection. Each socket occupies about &lt;code&gt;1.5 KB&lt;/code&gt; of memory, which can consume memory when there are many of them.&lt;br&gt;
&lt;code&gt;net.ipv4.tcp_fin_timeout = 10&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  24
&lt;/h3&gt;

&lt;p&gt;⚙️ Parameters related to TCP connection checks in the &lt;code&gt;SO_KEEPALIVE&lt;/code&gt; status: &lt;code&gt;keepalive_time&lt;/code&gt; specifies the time after which checks will begin after the last activity on the connection, keepalive_intvl determines the interval between checks, and keepalive_probes specifies the number of checks.&lt;br&gt;
&lt;code&gt;net.ipv4.tcp_keepalive_time = 1800&lt;/code&gt;&lt;br&gt;
&lt;code&gt;net.ipv4.tcp_keepalive_intvl = 15&lt;/code&gt;&lt;br&gt;
&lt;code&gt;net.ipv4.tcp_keepalive_probes = 5&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  25
&lt;/h3&gt;

&lt;p&gt;⚙️ Pay attention to the parameters net.ipv4.tcp_mem, net.ipv4.tcp_rmem, and net.ipv4.tcp_wmem. They heavily depend on the memory available on the server and are automatically calculated during system load. In general, it is not necessary to modify them, but sometimes it is possible to manually adjust them to increase the values.&lt;/p&gt;

&lt;h3&gt;
  
  
  26
&lt;/h3&gt;

&lt;p&gt;⚙️ Disable (enabled by default) the transmission of syncookies to the host in case of &lt;code&gt;SYN&lt;/code&gt; packet queue overflow for a specific socket.&lt;br&gt;
&lt;code&gt;net.ipv4.tcp_syncookies = 0&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  27
&lt;/h3&gt;

&lt;p&gt;⚙️ Special attention should be given to the congestion control algorithm used in TCP networks. There are many algorithms (cubic, htcp, bic, westwood, etc.), and it is difficult to definitively say which one is better to use. Algorithms show different results in different load scenarios. The kernel parameter &lt;code&gt;tcp_congestion_control&lt;/code&gt; controls this:&lt;br&gt;
&lt;code&gt;net.ipv4.tcp_congestion_control = cubic&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  28
&lt;/h3&gt;

&lt;p&gt;⚙️ When the server has numerous outbound connections, there may not be enough local ports for them. By default, the range 32768-60999 is used. It can be expanded:&lt;br&gt;
&lt;code&gt;net.ipv4.ip_local_port_range = 10240 65535&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  29
&lt;/h3&gt;

&lt;p&gt;⚙️ Enable protection against &lt;code&gt;TIME_WAIT&lt;/code&gt; attacks. By default, it is disabled.&lt;br&gt;
&lt;code&gt;net.ipv4.tcp_rfc1337 = 1&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  30
&lt;/h3&gt;

&lt;p&gt;⚙️ The maximum number of open sockets waiting for connections has a relatively low default value. In kernels prior to 5.3, it is 128, which was increased to &lt;code&gt;4096&lt;/code&gt; in kernel 5.4. It makes sense to increase it on busy and powerful servers:&lt;br&gt;
&lt;code&gt;net.core.somaxconn = 16384&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  31
&lt;/h3&gt;

&lt;p&gt;⚙️ On powerful and busy servers, you can increase the default buffer size values for both receiving and transmitting for all connections. This parameter is measured in bytes. By default, it is &lt;code&gt;212992&lt;/code&gt; or &lt;code&gt;208 KB&lt;/code&gt;.&lt;br&gt;
&lt;code&gt;net.core.rmem_default = 851968&lt;/code&gt;&lt;br&gt;
&lt;code&gt;net.core.wmem_default = 851968&lt;/code&gt;&lt;br&gt;
&lt;code&gt;net.core.rmem_max = 12582912&lt;/code&gt;&lt;br&gt;
&lt;code&gt;net.core.wmem_max = 12582912&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  32
&lt;/h3&gt;

&lt;p&gt;⚙️ Disable local ICMP packet redirects. This should only be done if your server does not act as a router, i.e., if you have a regular web server.&lt;br&gt;
&lt;code&gt;net.ipv4.conf.all.accept_redirects = 0&lt;/code&gt;&lt;br&gt;
&lt;code&gt;net.ipv4.conf.all.secure_redirects = 0&lt;/code&gt;&lt;br&gt;
&lt;code&gt;net.ipv4.conf.all.send_redirects = 0&lt;/code&gt;&lt;br&gt;
Additionally, you can completely disable kernel-level responses to ICMP requests. It is not a common practice.&lt;/p&gt;

&lt;p&gt;Sources:&lt;br&gt;
a. &lt;a href="https://man7.org/linux/man-pages/man7/tcp.7.html" rel="noopener noreferrer"&gt;https://man7.org/linux/man-pages/man7/tcp.7.html&lt;/a&gt;&lt;br&gt;
b. &lt;a href="https://cr.yp.to/syncookies.html" rel="noopener noreferrer"&gt;https://cr.yp.to/syncookies.html&lt;/a&gt;&lt;br&gt;
c. &lt;a href="https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/" rel="noopener noreferrer"&gt;https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/&lt;/a&gt;&lt;br&gt;
d. &lt;a href="https://blog.cloudflare.com/unbounded-memory-usage-by-tcp-for-receive-buffers-and-how-we-fixed-it/" rel="noopener noreferrer"&gt;https://blog.cloudflare.com/unbounded-memory-usage-by-tcp-for-receive-buffers-and-how-we-fixed-it/&lt;/a&gt;&lt;br&gt;
e. &lt;a href="https://www.geeksforgeeks.org/tcp-connection-termination/" rel="noopener noreferrer"&gt;https://www.geeksforgeeks.org/tcp-connection-termination/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Thanks to Mike Freemon and Marek 🐦@majek04 Majkowski.&lt;/p&gt;

</description>
      <category>kernel</category>
      <category>tuning</category>
      <category>networking</category>
      <category>linux</category>
    </item>
    <item>
      <title>Liveness Probes: Feel the Pulse of the App</title>
      <dc:creator>Roman Belshevitz</dc:creator>
      <pubDate>Mon, 28 Nov 2022 13:30:16 +0000</pubDate>
      <link>https://dev.to/otomato_io/liveness-probes-feel-the-pulse-of-the-app-133e</link>
      <guid>https://dev.to/otomato_io/liveness-probes-feel-the-pulse-of-the-app-133e</guid>
      <description>&lt;p&gt;This article will provide some helpful examples as the author  examines probes in Kubernetes. A correct probe definition can increase pod availability and resilience!&lt;/p&gt;

&lt;h2&gt;
  
  
  A Kubernetes Liveness Probe: What Is It?
&lt;/h2&gt;

&lt;p&gt;Based on a given test, the Liveness probe makes sure that an application inside a container is active and working.&lt;/p&gt;

&lt;h3&gt;
  
  
  ⚙️ Liveness probes
&lt;/h3&gt;

&lt;p&gt;They are used by the &lt;code&gt;kubelet&lt;/code&gt; to determine when to restart a container. Applications that crash or enter broken states are detected and, in many cases, can be rectified by restarting them.&lt;/p&gt;

&lt;p&gt;A successful configuration of the liveness probe results in no action being taken and no logs being kept. If it fails, the event is recorded, and the container is killed by the &lt;code&gt;kubelet&lt;/code&gt; in accordance with the &lt;code&gt;restartPolicy&lt;/code&gt; settings.&lt;/p&gt;

&lt;p&gt;When a pod might seem to be running, but the application might not be working properly, a liveness probe should be utilized. During a standstill, as an illustration. The pod might be operational, but it is ineffective since it cannot handle traffic.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fik5g939olte88z1gpa24.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fik5g939olte88z1gpa24.png" alt=" " width="800" height="515"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;🖼️ Pic source: K21Academy&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Since the &lt;code&gt;kubelet&lt;/code&gt; will check the &lt;code&gt;restartPolicy&lt;/code&gt; and restart the container automatically if it is set to &lt;code&gt;Always&lt;/code&gt; or &lt;code&gt;OnFailure&lt;/code&gt;, they are not required when the application is configured to crash the container on failure. The NGINX application, &lt;a href="https://serverfault.com/questions/1003361/how-to-automatically-restart-nginx-when-it-goes-down" rel="noopener noreferrer"&gt;for example&lt;/a&gt;, launches rapidly and shuts down if it encounters a problem that prevents it from serving pages. You are not in need of a liveness inquiry in this instance.&lt;/p&gt;

&lt;p&gt;There are common adjustable fields for every type of probe:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;initialDelaySeconds&lt;/code&gt;: Probes start running after initialDelaySeconds after container is started (default: 0)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;periodSeconds&lt;/code&gt;: How often probe should run (default: 10)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;timeoutSeconds&lt;/code&gt;: Probe timeout (default: 1)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;successThreshold&lt;/code&gt;: Required number of successful probes to mark container healthy/ready (default: 1)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;failureThreshold&lt;/code&gt;: When a probe fails, it will try failureThreshold times before deeming unhealthy/not ready (default: 3)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;periodSeconds&lt;/code&gt; field in each of the examples below says that the &lt;code&gt;kubelet&lt;/code&gt; should run a liveness probe every 5 seconds. The &lt;code&gt;initialDelaySeconds&lt;/code&gt; field instructs the &lt;code&gt;kubelet&lt;/code&gt; to delay the first probe for 5 seconds.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;timeoutSeconds&lt;/code&gt; option (Time to wait for the reply), &lt;code&gt;successThreshold&lt;/code&gt; (Number of successful probe executions to mark the container healthy), and &lt;code&gt;failiureThreshold&lt;/code&gt; (Number of failed probe executions to mark the container unhealthy), among other options, can also be customized, if desired.&lt;/p&gt;

&lt;p&gt;All different liveness probes can use these five parameters.&lt;/p&gt;
&lt;h2&gt;
  
  
  What other Kubernetes probes are available?
&lt;/h2&gt;

&lt;p&gt;Although the use of Liveness probes will be the main emphasis of this article, you should be aware that Kubernetes also supports the following other types of probes:&lt;/p&gt;
&lt;h3&gt;
  
  
  ⚙️ Startup probes
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;kubelet&lt;/code&gt; uses startup probes to help it determine when a container application has begun. When enabled, these make sure startup probes don't obstruct the application startup by disabling liveness and readiness checks until they are successful.&lt;/p&gt;

&lt;p&gt;These are especially helpful for slow-starting containers since they prevent the &lt;code&gt;kubelet&lt;/code&gt; from killing them before they have even started when a liveness probe fails. Set the startup probe's &lt;code&gt;failureThreshold&lt;/code&gt; greater if liveness probes are used on the same endpoint in order to enable lengthy startup periods.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test-api-deployment&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test-api&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test-api&lt;/span&gt;
        &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;myrepo/test-api:0.1&lt;/span&gt;
        &lt;span class="na"&gt;startupProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;httpGet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/health/startup&lt;/span&gt;
            &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
          &lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
          &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When a Pod starts and the probe fails, Kubernetes will try &lt;code&gt;failureThreshold&lt;/code&gt; times before giving up. Giving up in case of liveness probe means restarting the Pod. In case of readiness probe, the Pod will be marked &lt;code&gt;Unready&lt;/code&gt;. Defaults to &lt;code&gt;3&lt;/code&gt;. The minimum value is &lt;code&gt;1&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Some startup probe's math: why it is important?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;0 - 10 s: the container has been spun up but the &lt;code&gt;kubelet&lt;/code&gt; doesn't do anything waiting for the &lt;code&gt;initalDelaySeconds&lt;/code&gt; to pass&lt;/li&gt;
&lt;li&gt;10 - 20 s: the first probe request is sent but no response is sent back, this is because the app hasn’t stood up the APIs yet, this is either a failure due to 2 seconds timeout or an immediate TCP connection error&lt;/li&gt;
&lt;li&gt;20 - 30 s: the app has got up but has only started fetching credentials, configurations and so on, so the response to the probe request is 5xx&lt;/li&gt;
&lt;li&gt;30 - 210 s: the kubelet has been probing but the success response didn’t come and is reaching the limit set by the &lt;code&gt;failureThreshold&lt;/code&gt;. In this case, as per the deployment configuration for the startup probe, the pod will be restarted after roughly 212 seconds.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj1cficnbhlf0oqjn15ak.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj1cficnbhlf0oqjn15ak.png" alt=" " width="800" height="305"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;🖼️ Pic source: Wojciech Sierakowski (HMH Engineering)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It might be a little excessive to wait more than 3 minutes for the app to launch locally with faked dependencies!&lt;/p&gt;

&lt;p&gt;🎯 It might be also better to shorten this interval if you are absolutely certain that, for example, reading secrets, credentials, and establishing connections with DBs and other data sources shouldn't take so long. Doing so will slow down the deployment speed.&lt;/p&gt;

&lt;p&gt;Maybe it’s important to figure out if you even need more nodes. You don’t want to waste your money on resources you don’t need. Take a look at &lt;code&gt;kubectl top&lt;/code&gt; nodes to see if you even need to scale the nodes.&lt;/p&gt;

&lt;p&gt;🚧 If probe fails, the event is recorded, and the container is killed by the &lt;code&gt;kubelet&lt;/code&gt; in accordance with the &lt;code&gt;restartPolicy&lt;/code&gt; settings.&lt;/p&gt;

&lt;p&gt;When a container gets restarted you usually want to check the logs why the application went unhealthy. You can do this with the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl logs &amp;lt;pod-name&amp;gt; --previous
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  ⚙️ Readiness probes
&lt;/h3&gt;

&lt;p&gt;Readiness probes keep track of the application's availability. No traffic will be forwarded to the pod if it fails. These are employed when an application requires configuration before it is usable. Additionally, an application may experience traffic congestion and cause the probe to malfunction, stopping further traffic from being routed to it and allowing it to recover. The endpoints controller takes the pod out if it fails.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test-api-deployment&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test-api&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test-api&lt;/span&gt;
        &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;myrepo/test-api:0.1&lt;/span&gt;
        &lt;span class="na"&gt;readinessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;httpGet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/ready&lt;/span&gt;
            &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
          &lt;span class="na"&gt;successThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;kubelet&lt;/code&gt; finds that the container is not yet prepared to receive network traffic, but is making progress in that direction if the readiness probe fails but the liveness probe succeeds.&lt;/p&gt;

&lt;h2&gt;
  
  
  The operation of Kubernetes probes
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;kubelet&lt;/code&gt; controls the probes. The main "node agent" that executes on each node is the &lt;code&gt;kubelet&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsrzal6zufqn9o5ivc2as.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsrzal6zufqn9o5ivc2as.png" alt=" " width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;🖼️ Pic source: Andrew Lock (Datadog). SVG is &lt;a href="https://andrewlock.net/content/images/2020/k8s_probes.svg" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The application needs to support one of the following handlers in order to use a K8S probe effectively:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ExecAction&lt;/code&gt; handler: Executes a command inside the container. If the command returns a status code of &lt;code&gt;0&lt;/code&gt;, the diagnosis is successful.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;TCPSocketAction&lt;/code&gt; handler tries to establish a TCP connection to the pod's IP address on a particular port. If the port is discovered to be open, the diagnostic is successful.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Using the IP address of the pod, a particular port, and a predetermined destination, the &lt;code&gt;HTTPGetAction&lt;/code&gt; handler sends an &lt;code&gt;HTTP GET&lt;/code&gt; request. If the response code given falls between &lt;code&gt;200&lt;/code&gt; and &lt;code&gt;399&lt;/code&gt;, the diagnostic is successful.&lt;/p&gt;

&lt;p&gt;Before version 1.24 Kubernetes did not support gRPC health checks natively. This left the gRPC developers with the following three approaches when they deploy to Kubernetes:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2d3lx0swx49kxm1qjn54.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2d3lx0swx49kxm1qjn54.png" alt=" " width="800" height="286"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;🖼️ Pic source: Ahmet Alp Balkan (Twitter, ex-Google)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;As of Kubernetes version 1.24, gRPC handler &lt;a href="https://kubernetes.io/blog/2022/05/13/grpc-probes-now-in-beta/" rel="noopener noreferrer"&gt;can be configured&lt;/a&gt; to be used by &lt;code&gt;kubelet&lt;/code&gt; for application liveness checks if your application implements the gRPC Health Checking Protocol. To configure checks that use gRPC, you must enable the &lt;code&gt;GRPCContainerProbe&lt;/code&gt; &lt;a href="https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/" rel="noopener noreferrer"&gt;feature gate&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;When the &lt;code&gt;kubelet&lt;/code&gt; conducts a probe on a container, it answers with &lt;code&gt;Success&lt;/code&gt;, &lt;code&gt;Failure&lt;/code&gt;, or &lt;code&gt;Unknown&lt;/code&gt;, depending on whether the diagnostic was successful, unsuccessful, or incomplete for some other reason.&lt;/p&gt;
&lt;h2&gt;
  
  
  So, how rushy to track the pulse?
&lt;/h2&gt;

&lt;p&gt;You should examine the system behavior and typical starting timings of the pod and its containers before defining a probe so that you can choose the appropriate thresholds. Additionally, as the infrastructure or application changes, the probe choices should be changed. For instance, a pod's configuration to use more system resources can have an impact on the values that need to be configured for the probes.&lt;/p&gt;
&lt;h2&gt;
  
  
  Handlers in action: some examples
&lt;/h2&gt;
&lt;h3&gt;
  
  
  &lt;code&gt;ExecAction&lt;/code&gt; handler: how can it be useful in practice?
&lt;/h3&gt;

&lt;p&gt;🎯 It allows you to use commands inside containers to control the status of life of a counter in pods. With the help of this option, you may examine several aspects of container's operation, such as the existence of files, their contents, and other choices (accessible at the command level).&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ExecAction&lt;/code&gt; is executed in pod’s shell context and is deemed  failed if the execution returns any result code different from &lt;code&gt;0&lt;/code&gt; (zero).&lt;/p&gt;

&lt;p&gt;The example below demonstrates how to use the &lt;code&gt;exec&lt;/code&gt; command with the cat command to see if a file exists at the path &lt;code&gt;/usr/share/liveness/html/index.html&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;liveness&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;liveness-exec&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;liveness&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;registry.k8s.io/liveness:0.1&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
    &lt;span class="na"&gt;livenessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;exec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;cat&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/usr/share/liveness/html/index.html&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🚧 The container will be restarted if there is no file and the liveness probe will fail.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;TCPSocketAction&lt;/code&gt; handler: how can it be useful in practice?
&lt;/h3&gt;

&lt;p&gt;In this use case, the liveness probe makes use of the TCP handler to determine whether port &lt;code&gt;8080&lt;/code&gt; is active and open. With this configuration, your container will try to connect to the &lt;code&gt;kubelet&lt;/code&gt; by opening a socket on the designated port.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;liveness&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;liveness-tcp&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;liveness&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;registry.k8s.io/liveness:0.1&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
    &lt;span class="na"&gt;livenessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;tcpSocket&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🚧 The container will restart if the socket is dead and liveness probe fails.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;HTTPGetAction&lt;/code&gt; handler: how can it be useful in practice?
&lt;/h3&gt;

&lt;p&gt;This case demonstrates the HTTP handler that will send an HTTP GET request to the &lt;code&gt;/health&lt;/code&gt; path on port &lt;code&gt;8080&lt;/code&gt;. A value between &lt;code&gt;200&lt;/code&gt; and &lt;code&gt;400&lt;/code&gt; indicates that the probe was successful.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;liveness&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;liveness-http&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;liveness&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;registry.k8s.io/liveness:0.1&lt;/span&gt;
    &lt;span class="na"&gt;livenessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;httpGet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/health&lt;/span&gt;
        &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
        &lt;span class="na"&gt;httpHeaders&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Custom-Header&lt;/span&gt;
          &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ItsAlive&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🚧 The probe fails, and the container is restarted if a code outside this range is received. Any custom headers you want to transmit can be defined using the &lt;code&gt;httpHeaders&lt;/code&gt; option.&lt;/p&gt;

&lt;h3&gt;
  
  
  gRPC handler: how can it be useful in practice?
&lt;/h3&gt;

&lt;p&gt;gRPC protocol is on its way to becoming the &lt;em&gt;lingua franca&lt;/em&gt; for communication between cloud-native microservices. If you are deploying gRPC applications to Kubernetes today, you may be wondering about the best way &lt;a href="https://github.com/grpc/grpc/blob/master/doc/health-checking.md" rel="noopener noreferrer"&gt;to configure health checks&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This example demonstrates how to check port &lt;code&gt;2379&lt;/code&gt; responsiveness using the gRPC health checking protocol. A port must be specified in order to use a gRPC probe. You must also specify the service if the &lt;a href="https://kubernetes.io/docs/reference/using-api/health-checks/" rel="noopener noreferrer"&gt;health endpoint&lt;/a&gt; is set up on a non-default service.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;liveness&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;liveness-gRPC&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;liveness&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;registry.k8s.io/liveness:0.1&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2379&lt;/span&gt;
    &lt;span class="na"&gt;livenessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;grpc&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2379&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🚧 The container will restart if the gRPC socket is dead and liveness probe fails.&lt;/p&gt;

&lt;p&gt;Since there are no &lt;a href="https://grpc.github.io/grpc/core/md_doc_statuscodes.html" rel="noopener noreferrer"&gt;error codes&lt;/a&gt; for gRPC built-in probes, all errors are regarded as probe failures.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using liveness probes in the wrong way can lead to disaster
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Please remember that the container will be restarted if the liveness probe fails. It is not conventional to examine dependencies in a liveness probe, unlike a readiness probe. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To determine whether the container itself has stopped responding, a liveness probe should be utilized.&lt;/p&gt;

&lt;p&gt;A liveness probe has the drawback of maybe not verifying the service's responsiveness. For instance, if a service maintains two web servers, one for service routes and the other for status routes, such as readiness and liveness probes or metrics gathering, the service may be delayed or inaccessible while the liveness probe route responds without any issues. The liveness probe must use the service in a comparable way to dependent services for it to be effective.&lt;/p&gt;

&lt;p&gt;Like the readiness probe, it's crucial to take into account dynamics that change over time. A slight increase in response time, possibly brought on by a brief rise in load, could force the container to restart if the liveness-probe timeout is too short. The restart might put even more strain on the other pods supporting the service, leading to a further cascade of liveness probe failures and worsening the service's overall availability. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fox7ypl60h7ocwj3csckv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fox7ypl60h7ocwj3csckv.png" alt=" " width="720" height="483"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;🖼️ Pic source: Wojciech Sierakowski (HMH Engineering)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;These cascade failures can be prevented by configuring liveness probe timeouts on the order of client timeouts and employing a forgiving &lt;code&gt;failureThreshold&lt;/code&gt; count.&lt;/p&gt;

&lt;p&gt;Liveness probes may have a small issue with the container startup latency varying over time (see above about the math). Changes in resource allocation, network topology changes, or just rising load as your service grows could all contribute to this. &lt;/p&gt;

&lt;p&gt;If the &lt;code&gt;initialDelaySeconds&lt;/code&gt; option is insufficient and a container is restarted as a result of a Kubernetes node failure or a liveness probe failure, the application may never start or may start partially before being repeatedly destroyed and restarted. The container's maximum initialization time should be greater than the &lt;code&gt;initialDelaySeconds&lt;/code&gt; option. &lt;/p&gt;

&lt;h2&gt;
  
  
  Some notable suggestions are:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Keep dependencies out of liveness probes. Liveness probes should be reasonably priced and have consistent response times.&lt;/li&gt;
&lt;li&gt;So that system dynamics can alter temporarily or permanently without causing an excessive number of liveness probe failures, liveness probe timeouts should be conservatively set. Consider setting client timeouts and liveness-probe timeouts to the same value.&lt;/li&gt;
&lt;li&gt;To ensure that containers can be restarted with reliability even if starting dynamics vary over time, the initialDelaySeconds option should be set conservatively.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The inevitable summary
&lt;/h2&gt;

&lt;p&gt;By causing an automatic restart of a container after a failure of a particular test is discovered, the proper integration of liveness probes with readiness and startup probes can increase pod resilience and availability. It is necessary to comprehend the application in order to specify the appropriate alternatives for them.&lt;/p&gt;

&lt;p&gt;The author is thankful to Guy Menachem from Komodor for inspiration! Stable applications in the clouds to you all,  folks!&lt;/p&gt;

&lt;h3&gt;
  
  
  More to read:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Traefik &lt;a href="https://doc.traefik.io/traefik/user-guides/grpc/#grpc-examples" rel="noopener noreferrer"&gt;docs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Kubernetes &lt;a href="https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/#probe-v1-core" rel="noopener noreferrer"&gt;API reference&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Guy's &lt;a href="https://komodor.com/blog/kubernetes-health-checks-everything-you-need-to-know/" rel="noopener noreferrer"&gt;post&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>api</category>
    </item>
    <item>
      <title>Kubernetes TLS, Demystified</title>
      <dc:creator>Roman Belshevitz</dc:creator>
      <pubDate>Tue, 11 Oct 2022 18:16:13 +0000</pubDate>
      <link>https://dev.to/otomato_io/possible-paths-2hfc</link>
      <guid>https://dev.to/otomato_io/possible-paths-2hfc</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This is the anniversary 10th article in this series.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;🛡️ It is more than obvious a secured connection to any exposed service running in Kubernetes cluster is important. &lt;/p&gt;

&lt;p&gt;The supposition for this article is that you wish to set up TLS (Transport Layer Security) realm for your &lt;a href="https://docs.nginx.com/nginx-ingress-controller/" rel="noopener noreferrer"&gt;ingress resource&lt;/a&gt; and that you already have a functioning ingress controller established in your cluster.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The SSL replacement technology is called Transport Layer Security (TLS) today. An enhanced version of SSL is TLS. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Similar to how SSL operates, it uses encryption to safeguard the transmission of data and information. Although SSL is still commonly utilized in the industry, the &lt;em&gt;two names are frequently used interchangeably&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting a certificate: what paths can be taken?
&lt;/h2&gt;

&lt;p&gt;A TLS/SSL certificate is the fundamental prerequisite for ingress TLS. These certificates are available to you in the following ways.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Path one&lt;/strong&gt;. Self-signed certificates: Our own Certificate Authority (root CA) &lt;a href="https://www.ibm.com/docs/en/api-connect/10.0.1.x?topic=overview-generating-self-signed-certificate-using-openssl" rel="noopener noreferrer"&gt;created and signed&lt;/a&gt; the TLS certificate. It is a well-known, stunt choice for &lt;em&gt;testing scenarios&lt;/em&gt; where you can "collaborate" on the root CA so that browsers will accept the certificate. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx8ol9xl8v4idbbuh8ahf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx8ol9xl8v4idbbuh8ahf.png" alt=" " width="450" height="310"&gt;&lt;/a&gt;&lt;br&gt;
🖼️ &lt;em&gt;Pic source: Bizagi&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Path two&lt;/strong&gt;. Get an SSL certificate: for production use cases, you must &lt;a href="https://www.google.com/search?q=buy+ssl+certificate" rel="noopener noreferrer"&gt;purchase&lt;/a&gt; an SSL certificate from a reputable certificate authority that operating systems and browsers trust. But you must bear in mind that a so-called &lt;em&gt;wildcard certificate&lt;/em&gt; suitable to protect all subdomains in a domain can cost $300+/year from major commercial issuers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Path three&lt;/strong&gt;. Use a Let's Encrypt certificate: Let's Encrypt is a reputable certificate authority that issues &lt;em&gt;free&lt;/em&gt; TLS certificates. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  A few words about Let's Encrypt
&lt;/h3&gt;

&lt;p&gt;It is a non-profit organization founded by &lt;a href="https://letsencrypt.org/2022/09/12/remembering-peter-eckersley.html" rel="noopener noreferrer"&gt;enthusiasts&lt;/a&gt; in the field of struggle for privacy and security in 2014. &lt;/p&gt;

&lt;p&gt;The challenge–response protocol used to automate enrolling with the certificate authority is called Automated Certificate Management Environment (&lt;a href="https://letsencrypt.org/how-it-works/" rel="noopener noreferrer"&gt;ACME&lt;/a&gt;). It can query either Web servers or DNS servers controlled by the domain covered by the certificate to be issued.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuzzu0229x0x0tye8d3fk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuzzu0229x0x0tye8d3fk.png" alt=" " width="689" height="380"&gt;&lt;/a&gt;&lt;br&gt;
🖼️ &lt;em&gt;Pic source: Let's Encrypt&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you are interested in the implementation of the protocol, read what &lt;a href="https://blog.acolyer.org/2020/02/12/lets-encrypt-an-automated-certificate-authority-to-encrypt-the-entire-web/" rel="noopener noreferrer"&gt;Adrian Colyer&lt;/a&gt; from SpringSource writes about them.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Each SSL certificate has an &lt;em&gt;expiration date&lt;/em&gt;. So, before the certificate expires, you need to &lt;em&gt;rotate&lt;/em&gt; it. For instance, Let's Ecrypt certificates have a &lt;em&gt;three-month&lt;/em&gt; expiration date (and &lt;a href="https://letsencrypt.org/2015/11/09/why-90-days.html" rel="noopener noreferrer"&gt;here&lt;/a&gt; they tell why).&lt;/p&gt;

&lt;p&gt;This way, below in this article series, the author will dwell on the &lt;strong&gt;third path&lt;/strong&gt; further in detail. Why? Well, since this path is interesting for its relative self-sufficiency and [relative] independence from commercial / state-owned certificate issuers. In general, the motto of this path is: "If made something with your hands, you know how it works - you're more adapted to survival!"&lt;/p&gt;

&lt;p&gt;Of course, Let's Encrypt approach does not &lt;em&gt;always&lt;/em&gt; fit needs, but for academic purposes and for startups it works.&lt;/p&gt;

&lt;p&gt;But let's look at the situation "in manual mode" - just try to associate a certificate with a protected application. So, &lt;/p&gt;
&lt;h3&gt;
  
  
  Chicken or egg?
&lt;/h3&gt;

&lt;p&gt;The &lt;em&gt;ingress controller&lt;/em&gt;, not the ingress resource, is in charge of SSL. In other words, the ingress controller &lt;em&gt;accesses&lt;/em&gt; the TLS certificates you provide to the ingress resource as a Kubernetes &lt;code&gt;secret&lt;/code&gt; and incorporates them into its configuration.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs68iypo5revwa9w665am.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs68iypo5revwa9w665am.png" alt=" " width="768" height="467"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Setup TLS/SSL certificates for ingress
&lt;/h2&gt;

&lt;p&gt;Let's examine the procedures for setting up TLS for ingress. We'll start by launching a test application on the cluster. This application will be used to test our TLS-secured ingress.&lt;/p&gt;

&lt;p&gt;Establish the new namespace, &lt;code&gt;trial&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl create namespace trial
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Keep this as &lt;code&gt;hello-app.yaml&lt;/code&gt;. The &lt;code&gt;Deployment&lt;/code&gt; and &lt;code&gt;Service&lt;/code&gt; objects are present.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hello-app&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;trial&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hello&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hello&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hello&lt;/span&gt;
        &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rbalashevich/hello-app:2.0"&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hello-service&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;trial&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hello&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterIP&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hello&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
    &lt;span class="na"&gt;targetPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
    &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TCP&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Deploy the application with a command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl apply -f hello-app.yaml -n trial
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Create a Kubernetes TLS Secret
&lt;/h3&gt;

&lt;p&gt;It is necessary to make the SSL certificate a Kubernetes secret. It will subsequently be directed to the &lt;code&gt;tls&lt;/code&gt; block for &lt;code&gt;Ingress&lt;/code&gt; resources.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;server.crt&lt;/code&gt; (CA trust chain) and &lt;code&gt;server.key&lt;/code&gt; (private key) SSL files are assumed to be available from a Certificate Authority, your company, or self-signed, as a last resort.&lt;/p&gt;

&lt;p&gt;⚠️ A private key is created by you (the certificate owner) when you request your certificate with a Certificate Signing Request (CSR). Saying other words, you receive a private key when generate a CSR. You submit the CSR code to the certificate authority and keep private key in a safe place. &lt;/p&gt;

&lt;p&gt;As for the big three public cloud providers, they have instructions for exporting certificates: &lt;a href="https://docs.aws.amazon.com/acm/latest/userguide/export-private.html" rel="noopener noreferrer"&gt;AWS CM&lt;/a&gt;, &lt;a href="https://cloud.google.com/sdk/gcloud/reference/privateca/certificates" rel="noopener noreferrer"&gt;GCP CAS&lt;/a&gt;, &lt;a href="https://learn.microsoft.com/en-us/azure/key-vault/certificates/how-to-export-certificate?tabs=azure-cli" rel="noopener noreferrer"&gt;Azure KV&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It is necessary to make the SSL certificate a Kubernetes secret. It will subsequently be directed to the &lt;code&gt;tls&lt;/code&gt; block for &lt;code&gt;Ingress&lt;/code&gt; resources.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;And yes, keep the private key (&lt;code&gt;server.key&lt;/code&gt;)!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let's use the &lt;code&gt;server.crt&lt;/code&gt; and &lt;code&gt;server.key&lt;/code&gt; files to construct a Kubernetes secret of &lt;code&gt;tls&lt;/code&gt; type (SSL certificates). In the &lt;code&gt;trial&lt;/code&gt; namespace, where the &lt;code&gt;hello-app&lt;/code&gt; deployment is located, we are creating the secret.&lt;/p&gt;

&lt;p&gt;Run the &lt;code&gt;kubectl&lt;/code&gt; command listed below from the directory where your server is located. Supply the &lt;em&gt;absolute path&lt;/em&gt; to the files or the &lt;code&gt;.crt&lt;/code&gt; and &lt;code&gt;.key&lt;/code&gt; files. The name &lt;code&gt;hello-app-tls&lt;/code&gt; is made up.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl create secret tls hello-app-tls \
    --namespace trial \
    --key server.key \
    --cert server.crt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The comparable YAML file, where you must include the contents of the &lt;code&gt;.crt&lt;/code&gt; and &lt;code&gt;.key&lt;/code&gt; files, is provided below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Secret&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hello-app-tls&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;trial&lt;/span&gt;
&lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kubernetes.io/tls&lt;/span&gt;
&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;server.crt&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
       &lt;span class="s"&gt;&amp;lt;crt contents here&amp;gt;&lt;/span&gt;
  &lt;span class="na"&gt;server.key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
       &lt;span class="s"&gt;&amp;lt;private key contents here&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A Kubernetes ingress is &lt;a href="https://kubernetes.io/docs/concepts/services-networking/ingress/" rel="noopener noreferrer"&gt;a set of rules&lt;/a&gt; that can be configured to give services externally reachable URLs. Based on this understanding, to turn on secure connection, we should add &lt;code&gt;tls&lt;/code&gt; block to &lt;code&gt;Ingress&lt;/code&gt; object.  So, in the &lt;code&gt;trial&lt;/code&gt; namespace, we create the sample ingress TLS-capable resource:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;networking.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Ingress&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hello-app-ingress&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;trial&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ingressClassName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
  &lt;span class="na"&gt;tls&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;app.hosting.cloudprovider.com&lt;/span&gt;
    &lt;span class="na"&gt;secretName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hello-app-tls&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;app.hosting.cloudprovider.com"&lt;/span&gt;
    &lt;span class="na"&gt;http&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;pathType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Prefix&lt;/span&gt;
          &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/"&lt;/span&gt;
          &lt;span class="na"&gt;backend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hello-service&lt;/span&gt;
              &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="na"&gt;number&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;⚠️ Replace &lt;code&gt;app.hosting.cloudprovider.com&lt;/code&gt; to your actual hostname. The &lt;code&gt;host(s)&lt;/code&gt; should be the same in both the &lt;code&gt;rules&lt;/code&gt; and &lt;code&gt;tls&lt;/code&gt; blocks in the &lt;code&gt;Ingress&lt;/code&gt; manifest. In other words, they must match.&lt;/p&gt;

&lt;p&gt;In case of using NGINX ingress, you can add the supported annotation by the ingress controller you are using if you want a &lt;em&gt;strict&lt;/em&gt; SSL. For instance, you can use the &lt;a href="https://github.com/kubernetes/ingress-nginx/blob/main/docs/user-guide/nginx-configuration/annotations.md" rel="noopener noreferrer"&gt;annotation&lt;/a&gt; &lt;code&gt;nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"&lt;/code&gt; in the Nginx ingress controller to permit SSL traffic up until the application.&lt;/p&gt;

&lt;h3&gt;
  
  
  The way to make sure
&lt;/h3&gt;

&lt;p&gt;Let's check with &lt;code&gt;curl https://app.hosting.cloudprovider.com -kv&lt;/code&gt;, is the connection to the app secure now:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=app.hosting.cloudprovider.com
*  start date: Oct 6 15:35:07 2022 GMT
*  expire date: Oct 6 15:35:07 2023 GMT
*  issuer: CN=Go Daddy Secure Certificate Authority - G2,
              OU=http://certs.godaddy.com/repository/,
              O="GoDaddy.com, Inc.",L=Scottsdale,ST=Arizona,C=US
*  SSL certificate verify ok.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🔒 If the certificate is valid, then the browser will not swear and there will be no frightening warnings either. Voilà, the connection to our app is secure!&lt;/p&gt;

&lt;p&gt;Okay, we've covered the situations for the first and second paths. The next step is pathfinding the third path involving Let's Encrypt certificate. &lt;/p&gt;

&lt;h2&gt;
  
  
  Estne vita vere brevis?
&lt;/h2&gt;

&lt;p&gt;Sed vita est cum dignitate vivendum. As the author noted above, the life of a certificate from a let's encrypt is &lt;a href="https://letsencrypt.org/2015/11/09/why-90-days.html" rel="noopener noreferrer"&gt;short&lt;/a&gt;. You have to pay for insolence. Accordingly, some solution is required that would automate the re-issuance of short-lived certificates, right? And such a solution exists, it is &lt;code&gt;cert-manager&lt;/code&gt;! It streamlines the process of getting, renewing, and using certificates by adding certificates and certificate issuers as &lt;em&gt;resource types&lt;/em&gt; in Kubernetes clusters.&lt;/p&gt;

&lt;p&gt;It can generate certificates from a number of supported sources, including Let's Encrypt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2w3rx680y8nvz18b9ms8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2w3rx680y8nvz18b9ms8.png" alt=" " width="747" height="435"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Furthermore, it will check that certificates are current and valid and make an attempt to &lt;strong&gt;renew&lt;/strong&gt; them for a specified period &lt;strong&gt;before&lt;/strong&gt; they &lt;strong&gt;expire&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Install cert-manager on Kubernetes
&lt;/h3&gt;

&lt;p&gt;According to the official &lt;code&gt;cert-manager&lt;/code&gt; documentation, you can install it by using &lt;a href="https://cert-manager.io/docs/installation/kubectl/" rel="noopener noreferrer"&gt;kubectl&lt;/a&gt; or by the provided &lt;a href="https://cert-manager.io/docs/installation/helm/" rel="noopener noreferrer"&gt;helm&lt;/a&gt; chart.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Create a dedicated Kubernetes namespace for cert-manager
kubectl create namespace cert-manager

# Add official cert-manager repository to helm CLI
helm repo add jetstack https://charts.jetstack.io

# Update Helm repository cache (think of apt update)
helm repo update

# Install cert-manager on Kubernetes
## cert-manager relies on several Custom Resource Definitions (CRDs)
helm install certmgr jetstack/cert-manager \
    --set installCRDs=true \
    --version v1.9.1 \
    --namespace cert-manager
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;Issuer&lt;/code&gt; is responsible for issuing certificates. It is the signing authority and based on its configuration. The issuer knows how certificate requests are handled.  &lt;/p&gt;

&lt;p&gt;Cert-manager also creates several objects using different specifications such as &lt;code&gt;CertificateRequest&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;A &lt;code&gt;Certificate&lt;/code&gt; resource is a readable representation of a certificate request. Certificate resources are linked to an &lt;code&gt;Issuer&lt;/code&gt; who is responsible for requesting and renewing the certificate. &lt;/p&gt;

&lt;p&gt;To determine &lt;em&gt;if&lt;/em&gt; a certificate &lt;em&gt;needs to be re-issued&lt;/em&gt;, &lt;code&gt;cert-manager&lt;/code&gt; looks at the the &lt;code&gt;spec&lt;/code&gt; of &lt;code&gt;Certificate&lt;/code&gt; resource and latest &lt;code&gt;CertificateRequests&lt;/code&gt; as well as the data in &lt;code&gt;Secret&lt;/code&gt; containing the certificate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Let's Encrypt: staging or production server?
&lt;/h3&gt;

&lt;p&gt;An &lt;code&gt;Issuer&lt;/code&gt; is a custom resource (CRD) which tells &lt;code&gt;cert-manager&lt;/code&gt; how to sign a &lt;code&gt;Certificate&lt;/code&gt;. Following &lt;a href="https://cert-manager.io/docs/tutorials/getting-started-with-cert-manager-on-google-kubernetes-engine-using-lets-encrypt-for-ingress-ssl/" rel="noopener noreferrer"&gt;this howto (section 7)&lt;/a&gt; the &lt;code&gt;Issuer&lt;/code&gt; will be configured to connect to the Let's Encrypt staging server, which allows us to test everything without using up your Let's Encrypt &lt;a href="https://letsencrypt.org/docs/rate-limits/" rel="noopener noreferrer"&gt;certificate quota&lt;/a&gt; for the domain name. &lt;/p&gt;

&lt;p&gt;After debugging, you can safely issue a certificate by using LE's production server.&lt;/p&gt;

&lt;p&gt;A video describing &lt;code&gt;cert-manager&lt;/code&gt; YAML syntax and recommended by the author of this article is &lt;a href="https://youtu.be/7m4_kZOObzw" rel="noopener noreferrer"&gt;📽️ Anton Putra's good one&lt;/a&gt;. &lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;SSL certificate acquisition was made simple by Let's Encrypt's reputation as a reliable certificate authority. Together with &lt;code&gt;cert-manager&lt;/code&gt; tool, ops can quickly and easily assure correct transport encryption and interoperability with already-existing parts like NGINX Ingress. In addition to the example mentioned above, &lt;code&gt;cert-manager&lt;/code&gt; can help with trickier situations like those involving wildcard SSL certificates.&lt;/p&gt;

&lt;p&gt;If you're interested in using letsencrypt outside of a kubernetes cluster, take a look at &lt;a href="https://github.com/caddyserver/caddy" rel="noopener noreferrer"&gt;Caddy&lt;/a&gt;, a 43k ⭐ open source web server, and also at Certbot, a 29k ⭐ ACME client which is open source, too.&lt;/p&gt;

&lt;p&gt;Ever tried using &lt;code&gt;wireshark&lt;/code&gt; to monitor web traffic? Follow &lt;a href="https://www.comparitech.com/net-admin/decrypt-ssl-with-wireshark/" rel="noopener noreferrer"&gt;Aaron Phillips&lt;/a&gt; from Comparitech to learn how.&lt;/p&gt;

&lt;p&gt;Safe connections to you!&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>ssl</category>
      <category>api</category>
      <category>crd</category>
    </item>
    <item>
      <title>Admission Controllers in Action: Datree's Approach</title>
      <dc:creator>Roman Belshevitz</dc:creator>
      <pubDate>Sun, 11 Sep 2022 09:19:45 +0000</pubDate>
      <link>https://dev.to/otomato_io/admission-controllers-in-action-datrees-approach-143d</link>
      <guid>https://dev.to/otomato_io/admission-controllers-in-action-datrees-approach-143d</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/otomato_io/responsible-approach-to-communicating-with-the-api-server-admission-controllers-3b49"&gt;eighth part&lt;/a&gt;, the author talked about admission controllers. In this, the ninth, we will see how ACs can be used for practical purposes.&lt;/p&gt;

&lt;p&gt;At the same time, it may be considered that this is the second part of the &lt;a href="https://dev.to/otomato_io/datree-a-tool-which-really-shifts-your-cluster-security-even-more-left-1g20"&gt;review&lt;/a&gt;, so both of the parts will be marked with the appropriate &lt;code&gt;#datree&lt;/code&gt; tag.&lt;/p&gt;

&lt;h2&gt;
  
  
  The originality of Datree's approach
&lt;/h2&gt;

&lt;p&gt;In brief, &lt;a href="https://github.com/datreeio/admission-webhook-datree" rel="noopener noreferrer"&gt;Datree's integration&lt;/a&gt; enables you to check your resources against the defined policy &lt;strong&gt;a moment before&lt;/strong&gt; you put them into a cluster... by leveraging an admission webhook! 😎 &lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/datreeio/admission-webhook-datree/blob/main/kube/validating-webhook-configuration.yaml" rel="noopener noreferrer"&gt;The webhook&lt;/a&gt; implemented with &lt;code&gt;ValidatingWebhookConfiguration&lt;/code&gt; will detect &lt;a href="https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#webhook-configuration" rel="noopener noreferrer"&gt;operations&lt;/a&gt; such as &lt;code&gt;CREATE&lt;/code&gt;, &lt;code&gt;UPDATE&lt;/code&gt; or &lt;code&gt;DELETE&lt;/code&gt;, and it will start a policy check against the configs related to each operation. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjvvkt02z3978vwpma40z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjvvkt02z3978vwpma40z.png" alt=" " width="427" height="127"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If any configuration errors are discovered, the webhook will refuse the action and show a thorough output with guidance on how to fix each error.&lt;/p&gt;

&lt;p&gt;Every cluster operation that is tied up once the webhook is installed will cause a Datree policy check. If there are no configuration errors, the resource will get a green light🚦 to be applied or updated. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🔬 Datree is functioning well in a full-scale cluster and also in a &lt;code&gt;k3s/k3d&lt;/code&gt;-baked one! It make a debugging process more suitable even for local development. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Let's go step by step
&lt;/h2&gt;

&lt;p&gt;Following the Software-as-a-Service paradigm, Datree provides their users access to the misconfigurations' database and to the personal workspace at their website, where all the checks initiated by the user become aggregated. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqeq74rpsh6peeouf91yo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqeq74rpsh6peeouf91yo.png" alt=" " width="800" height="398"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This cheeky astronaut design will brighten up your day.&lt;/p&gt;

&lt;h3&gt;
  
  
  🎯 Step 1. Access token
&lt;/h3&gt;

&lt;p&gt;They [want to] know everything about you! Well, relax, it is a joke. Sign up or &lt;a href="https://app.datree.io/login" rel="noopener noreferrer"&gt;log in&lt;/a&gt;, then grab your token to access &lt;code&gt;datree&lt;/code&gt; programmatically. API access tokens are widespread in 2022, aren't they? 🔏&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgx48pcy2urgxpngqbhzb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgx48pcy2urgxpngqbhzb.png" alt=" " width="711" height="283"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;💰 There is only one token available when using the so-called Free Plan, enough for evaluation purposes (up to 4 Kubernetes nodes are supported; service access logs will be stored for two weeks).&lt;/p&gt;

&lt;h3&gt;
  
  
  🎯 Step 2. Set up your CLI environment
&lt;/h3&gt;

&lt;p&gt;The following binaries must be installed on the machine: &lt;code&gt;kubectl&lt;/code&gt;, &lt;code&gt;openssl&lt;/code&gt; (required for creating a certificate authority, CA) and &lt;code&gt;curl&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Assume everything is in place. Let's run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ DATREE_TOKEN=[your-token] bash &amp;lt;(curl https://get.datree.io/admission-webhook)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What you should see and what should happen to your cluster (yes, API requests are additionally encrypted):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;🔑 Generating TLS keys...
Generating a RSA private key
Signature ok
subject=CN = webhook-server.datree.svc
Getting CA Private Key
/home/roman
🔗 Creating webhook secret tls...
secret "webhook-server-tls" deleted
secret/webhook-server-tls created
🔗 Creating core resources...
serviceaccount/webhook-server-datree created
clusterrolebinding.rbac.authorization.k8s.io/rolebinding:webhook-server-datree created
clusterrole.rbac.authorization.k8s.io/webhook-server-datree created
deployment.apps/webhook-server configured
service/webhook-server created
deployment "webhook-server" successfully rolled out
🔗 Creating validation webhook resource...
validatingwebhookconfiguration.admissionregistration.k8s.io/webhook-datree configured
🎉 DONE! The webhook server is now deployed and configured
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  🎯 Step 3. Protect your access token
&lt;/h3&gt;

&lt;p&gt;Because your token is private and you don't want to store it in your repository, we advise setting or changing it using a different &lt;code&gt;kubectl patch&lt;/code&gt; command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ kubectl patch deployment webhook-server -n datree -p '
spec:
  template:
    spec:
      containers:
        - name: server
          env:
            - name: DATREE_TOKEN
              value: "&amp;lt;your-token&amp;gt;"'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  🎯 Step 4. Deploy something: magic will work for you
&lt;/h3&gt;

&lt;p&gt;The author does not want to reinvent the wheel:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx-deployment&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
        &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx:1.14.2&lt;/span&gt;
        &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's focus and look at the &lt;code&gt;kubectl apply -f nginx-deployment.yaml&lt;/code&gt; routine's result (the deployment has been &lt;strong&gt;denied&lt;/strong&gt; by AC):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ kubectl apply -f nginx-deployment.yaml
Error from server: error when creating "nginx-deployment.yaml": admission webhook "webhook-server.datree.svc" denied the request: 
webhook-nginx-deployment-Deployment.tmp.yaml

[V] YAML validation
[V] Kubernetes schema validation

[X] Policy check

❌  Ensure each container has a configured CPU limit  [1 occurrence]
    - metadata.name: nginx-deployment (kind: Deployment)
💡  Missing property object `limits.cpu` - value should be within the accepted boundaries recommended by the organization

❌  Ensure each container has a configured CPU request  [1 occurrence]
    - metadata.name: nginx-deployment (kind: Deployment)
💡  Missing property object `requests.cpu` - value should be within the accepted boundaries recommended by the organization

❌  Ensure each container has a configured liveness probe  [1 occurrence]
    - metadata.name: nginx-deployment (kind: Deployment)
💡  Missing property object `livenessProbe` - add a properly configured livenessProbe to catch possible deadlocks

❌  Ensure each container has a configured memory limit  [1 occurrence]
    - metadata.name: nginx-deployment (kind: Deployment)
💡  Missing property object `limits.memory` - value should be within the accepted boundaries recommended by the organization

❌  Ensure each container has a configured memory request  [1 occurrence]
    - metadata.name: nginx-deployment (kind: Deployment)
💡  Missing property object `requests.memory` - value should be within the accepted boundaries recommended by the organization

❌  Ensure each container has a configured readiness probe  [1 occurrence]
    - metadata.name: nginx-deployment (kind: Deployment)
💡  Missing property object `readinessProbe` - add a properly configured readinessProbe to notify kubelet your Pods are ready for traffic

❌  Prevent workload from using the default namespace  [1 occurrence]
    - metadata.name: nginx-deployment (kind: Deployment)
💡  Incorrect value for key `namespace` - use an explicit namespace instead of the default one (`default`)


(Summary)

- Passing YAML validation: 1/1

- Passing Kubernetes (v1.21.5) schema validation: 1/1

- Passing policy check: 0/1

+-----------------------------------+-----------------------+
| Enabled rules in policy "Default" | 21                    |
| Configs tested against policy     | 1                     |
| Total rules evaluated             | 21                    |
| Total rules skipped               | 0                     |
| Total rules failed                | 7                     |
| Total rules passed                | 14                    |
| See all rules in policy           | https://app.datree.io |
+-----------------------------------+-----------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Opening the link, you'll be redirected to your personal workspace. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6w70bgsoyktv1kyl8kxc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6w70bgsoyktv1kyl8kxc.png" alt=" " width="800" height="396"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  🎯 Step 5. Is such rigor necessary?
&lt;/h3&gt;

&lt;p&gt;You can audit &lt;a href="https://hub.datree.io/setup/centralized-policy" rel="noopener noreferrer"&gt;reactive policies&lt;/a&gt; and review invocation history. If checks are too strict, unset some of the policies. &lt;/p&gt;

&lt;p&gt;For example, not in every deployment you really need pre-configured container readiness probes [or CPU &amp;amp; memory limits].  &lt;/p&gt;

&lt;p&gt;Well, &lt;strong&gt;another tryout&lt;/strong&gt; will be on edited YAML (&lt;a href="https://k8syaml.com/" rel="noopener noreferrer"&gt;Octopus&lt;/a&gt; may be your fellow):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx-deployment&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
  &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;RollingUpdate&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
          &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx:1.14.2&lt;/span&gt;
          &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
          &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;100Mi&lt;/span&gt;
              &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;100m&lt;/span&gt;
            &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;200Mi&lt;/span&gt;
              &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;1'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As we really want to use the &lt;code&gt;default&lt;/code&gt; namespace and have no fears with it, let's disable &lt;code&gt;Prevent workload from using the default namespace&lt;/code&gt; policy in Datree web UI. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzygl4h25ucxjlxnp20je.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzygl4h25ucxjlxnp20je.png" alt=" " width="685" height="42"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Also we may want to liberate us from &lt;code&gt;Ensure each container has a configured readiness probe&lt;/code&gt; and &lt;code&gt;Ensure each container has a configured liveness probe policies&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Et voilà! 🎭&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ kubectl apply -f nginx-deployment-advanced.yaml
deployment.apps/nginx-deployment created
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now all checks are passed successfully and the admission controller have got us a green light🚦 and allowed the deployment!&lt;/p&gt;

&lt;h3&gt;
  
  
  🎯 Step 6. Hey, do not climb where it is not necessary
&lt;/h3&gt;

&lt;p&gt;If you want &lt;code&gt;datree&lt;/code&gt; to disregard a namespace, add the label &lt;code&gt;admission.datree/validate=skip&lt;/code&gt; to its configuration.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ kubectl label namespaces default "admission.datree/validate=skip"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How to wipe traces
&lt;/h2&gt;

&lt;p&gt;To delete the label and resume running the &lt;code&gt;datree&lt;/code&gt; webhook on the namespace again:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ kubectl label namespaces default "admission.datree/validate-"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Uninstall the webhook
&lt;/h2&gt;

&lt;p&gt;Copy the following command and run it in your terminal to remove the webhook:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ bash &amp;lt;(curl https://get.datree.io/admission-webhook-uninstall)
validatingwebhookconfiguration.admissionregistration.k8s.io "webhook-datree" deleted
service "webhook-server" deleted
deployment.apps "webhook-server" deleted
secret "webhook-server-tls" deleted
clusterrolebinding.rbac.authorization.k8s.io "rolebinding:webhook-server-datree" deleted
serviceaccount "webhook-server-datree" deleted
clusterrole.rbac.authorization.k8s.io "webhook-server-datree" deleted
namespace/kube-system unlabeled
namespace "datree" deleted
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Summing up what was said
&lt;/h2&gt;

&lt;p&gt;As you could understand, the possibilities of Kubernetes API are quite wide. The author hopes that he not only prepared an overview of a useful solution, but also explained the theoretical aspects of its functionality.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>security</category>
      <category>datree</category>
    </item>
    <item>
      <title>Responsible Approach to Communicating With the API Server: Admission Controllers</title>
      <dc:creator>Roman Belshevitz</dc:creator>
      <pubDate>Fri, 02 Sep 2022 12:25:12 +0000</pubDate>
      <link>https://dev.to/otomato_io/responsible-approach-to-communicating-with-the-api-server-admission-controllers-3b49</link>
      <guid>https://dev.to/otomato_io/responsible-approach-to-communicating-with-the-api-server-admission-controllers-3b49</guid>
      <description>&lt;h2&gt;
  
  
  A bit of theory
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;RBAC&lt;/em&gt; and &lt;em&gt;Network policies&lt;/em&gt; are two fundamental security elements of Kubernetes that you probably already know about if you work with it. These mechanisms are helpful for enforcing fundamental guidelines regarding what operations different users or services within your cluster are permitted to carry out.&lt;/p&gt;

&lt;p&gt;However, there are situations when you require &lt;em&gt;more policy features&lt;/em&gt; or granularity than RBAC or network policies can provide. Alternatively, you might want to &lt;em&gt;run additional checks&lt;/em&gt; to verify a resource &lt;em&gt;before&lt;/em&gt; allowing it to join your cluster.&lt;/p&gt;

&lt;p&gt;Admission Controllers (ACs) allow you to add &lt;em&gt;additional options&lt;/em&gt; to the work of Kubernetes to change or &lt;em&gt;validate objects&lt;/em&gt; when making requests to the Kubernetes API. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkswoeuleimjfswqokxy2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkswoeuleimjfswqokxy2.png" alt=" " width="800" height="298"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🖼️ Pic source: Giant Swarm&lt;/p&gt;

&lt;p&gt;The image shows the various parts that make up the API component. The request initiates communication between the API and the admission controller. The authorization module determines whether the request issuer is permitted to carry out the operation after the request has been authenticated. The admittance magic kicks in once the request have been duly approved.&lt;/p&gt;

&lt;p&gt;If the controller rejects the request, then the entire request to the API server is rejected and an error is returned to the end user.&lt;/p&gt;

&lt;p&gt;To activate controllers discussed, you must specify the names of the controllers in the form of a list when creating or updating a cluster. After that, &lt;code&gt;kube-apiserver&lt;/code&gt; will be started or restarted with the &lt;code&gt;--enable-admission-plugins&lt;/code&gt; option and access controllers set.&lt;/p&gt;

&lt;p&gt;Passing a controller that is not available for the current version of Kubernetes will return an appropriate error.&lt;/p&gt;

&lt;h2&gt;
  
  
  What exactly can ACs be?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  🔬 In a scope of implementation
&lt;/h3&gt;

&lt;p&gt;Admission controllers that are built into and made available by Kubernetes itself are known as &lt;strong&gt;static&lt;/strong&gt; admission controllers. Not every one of them is turned on by default. The cloud companies also grab some of them or restrict some of them for their own usage. You can enable and utilize them if you are the owner of your Kubernetes deployment. Some examples:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;LimitRanger&lt;/code&gt; &lt;em&gt;makes sure that any of the restrictions&lt;/em&gt; listed in the &lt;code&gt;LimitRange&lt;/code&gt; object in a namespace &lt;em&gt;are not broken&lt;/em&gt; by incoming requests. Use this admission controller to impose those restrictions if you are utilizing &lt;code&gt;LimitRange&lt;/code&gt; objects in your Kubernetes setup. Applying default resource requests to pods without any specifications is also possible with this AC.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;AlwaysPullImages&lt;/code&gt; &lt;em&gt;changes the image pull policy&lt;/em&gt; for every new Pod. This is useful, for example, in multi-tenant clusters to ensure that only those with the credentials to fetch private images can access them. Without this admission controller, after an image has been pulled to a node, any pod from any user can use it just by knowing the image's name without any authorization checks. This feature must be enabled in the cluster.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;NamespaceLifecycle&lt;/code&gt; &lt;em&gt;enforces that a namespace that is undergoing termination cannot have new objects&lt;/em&gt; created in it, and ensures that requests in a non-existent namespace are rejected.&lt;/p&gt;

&lt;p&gt;And there are &lt;strong&gt;dynamic&lt;/strong&gt; ones. See details below.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔬 In a scope of request processing
&lt;/h3&gt;

&lt;p&gt;There are &lt;em&gt;two types&lt;/em&gt; of dynamic admission controllers in Kubernetes. They work slightly differently. Saying shortly, one just &lt;em&gt;validates&lt;/em&gt; the requests, and the other &lt;em&gt;modifies&lt;/em&gt; it if it isn’t up to spec.&lt;/p&gt;

&lt;p&gt;⚙️ &lt;strong&gt;The first type&lt;/strong&gt; is the &lt;strong&gt;validating&lt;/strong&gt; admission controller &lt;code&gt;ValidatingAdmissionWebhook&lt;/code&gt;, which proxies the requests to the subscribed webhooks. The Kubernetes API registers the webhooks based on the resource type and the request method. Every webhook runs some logic to validate the incoming resource, and it replies with a verdict to the API. &lt;/p&gt;

&lt;p&gt;In case the validation webhook rejects the request, the Kubernetes API returns a failed HTTP response to the user. Otherwise, it continues with the next admission.&lt;/p&gt;

&lt;p&gt;⚙️ &lt;strong&gt;The second type&lt;/strong&gt; is a &lt;strong&gt;mutating&lt;/strong&gt; admission controller &lt;code&gt;MutatingAdmissionWebhook&lt;/code&gt;, which alters the resource that the user has submitted so that default values can be set, or the schema can be verified. The API can have mutation webhooks attached by cluster administrators so that they can execute them similarly to validation. &lt;/p&gt;

&lt;h2&gt;
  
  
  Hooks! Hooks are everywhere!
&lt;/h2&gt;

&lt;p&gt;Any resource type, including those that are pre-built like pods, jobs, or services, may be the primary resource type for a controller. The issue is that most built-in resources, if not all of them, already come with associated built-in controllers. In order to prevent having &lt;em&gt;many&lt;/em&gt; controllers update the status of a shared object, &lt;em&gt;custom controllers&lt;/em&gt; are frequently built for special resources. &lt;/p&gt;

&lt;p&gt;If resources are merely Kubernetes API endpoints, writing a controller for a resource is just a fancy &lt;strong&gt;way to bind a request handler to an API endpoint&lt;/strong&gt;! &lt;/p&gt;

&lt;p&gt;Conditional resource modification can be implemented using a so-called webhook, which is essentially an &lt;a href="https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#request" rel="noopener noreferrer"&gt;API endpoint&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;It is possible &lt;a href="https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/" rel="noopener noreferrer"&gt;to configure dynamically&lt;/a&gt; what resources are subject to what admission webhooks via &lt;code&gt;ValidatingWebhookConfiguration&lt;/code&gt; or &lt;code&gt;MutatingWebhookConfiguration&lt;/code&gt; kinds. &lt;/p&gt;

&lt;p&gt;Both are available in &lt;code&gt;admissionregistration.k8s.io/v1&lt;/code&gt; API version.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ kubectl api-versions | grep admiss
admissionregistration.k8s.io/v1
admissionregistration.k8s.io/v1beta1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How would I activate admission controllers?
&lt;/h2&gt;

&lt;p&gt;To use before changing cluster objects, the Kubernetes API server flag &lt;code&gt;--enable-admission-plugins&lt;/code&gt; accepts a comma-delimited list of AC plugins. For instance, the following command line activates the &lt;code&gt;LimitRanger&lt;/code&gt; and &lt;code&gt;NamespaceLifecycle&lt;/code&gt; admission control plugins:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kube-apiserver --enable-admission-plugins=NamespaceLifecycle,LimitRanger
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;⚠️ Note: You may need to apply the parameters in different ways depending on how your Kubernetes cluster is installed and how the API server is launched. For instance, if Kubernetes is deployed using self-hosted Kubernetes, you may need to alter the manifest file for the API server &lt;a href="https://digitalis.io/blog/kubernetes/k3s-lightweight-kubernetes-made-ready-for-production-part-2/" rel="noopener noreferrer"&gt;and/or the &lt;code&gt;systemd&lt;/code&gt; Unit file&lt;/a&gt; if the API server is installed as a &lt;code&gt;systemd&lt;/code&gt; service.&lt;/p&gt;

&lt;p&gt;⚠️ Note: API kind &lt;code&gt;admissionregistration.k8s.io/v1beta1&lt;/code&gt; became deprecated in 1.22+ &lt;/p&gt;

&lt;h2&gt;
  
  
  Public cloud providers' implementation
&lt;/h2&gt;

&lt;p&gt;In this case, everything is already set up for you. &lt;/p&gt;

&lt;p&gt;To learn more about using dynamic admission controllers with Amazon EKS, see the &lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/platform-versions.html" rel="noopener noreferrer"&gt;Amazon EKS documentation&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.microsoft.com/en-us/azure/architecture/operator-guides/aks/aks-triage-controllers" rel="noopener noreferrer"&gt;Azure AKS Policy&lt;/a&gt;, Microsoft's implementation of OPA Gatekeeper, is another interesting thing. Involving AC webhooks,  if there are problems in the admission control pipeline, it can block numerous requests to the API server.&lt;/p&gt;

&lt;p&gt;The VMware Tanzu team followed a similar path in their &lt;a href="https://tanzu.vmware.com/developer/guides/platform-security-admission-control/" rel="noopener noreferrer"&gt;Tanzu Kubernetes Grid (TKG)&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Of course, OPA Gatekeeper itself is a separate and extensive topic, so more on that another time.&lt;/p&gt;

&lt;p&gt;In the ninth article of the series, the author will talk about how smart people were able to translate the theory described above into a useful solution.&lt;/p&gt;

&lt;p&gt;Be careful and stay tuned!&lt;/p&gt;

&lt;p&gt;Many thanks to Leonid Sandler, Douglas Makey &lt;a class="mentioned-user" href="https://dev.to/douglasmakey"&gt;@douglasmakey&lt;/a&gt;  Mendez Molero, Luca 🐦 @LucaDiMaio11 Di Maio, Kristijan Mitevski and W.T. Chang!&lt;/p&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>security</category>
      <category>api</category>
    </item>
    <item>
      <title>Virtual Kubernetes Clusters: What Are They Needed For?</title>
      <dc:creator>Roman Belshevitz</dc:creator>
      <pubDate>Mon, 29 Aug 2022 18:58:50 +0000</pubDate>
      <link>https://dev.to/otomato_io/virtual-kubernetes-clusters-what-are-they-needed-for-4fdd</link>
      <guid>https://dev.to/otomato_io/virtual-kubernetes-clusters-what-are-they-needed-for-4fdd</guid>
      <description>&lt;h2&gt;
  
  
  Developer Wishlist Never Ends
&lt;/h2&gt;

&lt;p&gt;Imagine you can&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;have &lt;strong&gt;many&lt;/strong&gt; virtual clusters &lt;strong&gt;within a single&lt;/strong&gt; cluster, and &lt;/li&gt;
&lt;li&gt;they are &lt;strong&gt;much cheaper&lt;/strong&gt; than the traditional Kubernetes clusters, and &lt;/li&gt;
&lt;li&gt;they require &lt;strong&gt;lower&lt;/strong&gt; management and maintenance &lt;strong&gt;efforts&lt;/strong&gt;. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sounds intriguing, eh? This makes v/clusters ideal for running experiments, continuous integration, and setting up &lt;strong&gt;sandbox&lt;/strong&gt; 🧪 environments.&lt;/p&gt;

&lt;p&gt;So, Loft Labs created such a solution written natively in Golang and made it an ~2k⭐ &lt;a href="https://github.com/loft-sh/vcluster" rel="noopener noreferrer"&gt;open source&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's under the hood?
&lt;/h2&gt;

&lt;p&gt;On top of other Kubernetes clusters, virtual clusters are fully functional Kubernetes clusters. Virtual clusters utilize the worker nodes and networking of the host cluster, as opposed to completely distinct "real" clusters. They schedule all workloads into a single namespace of the host cluster and have their own control plane. Virtual clusters divide a single physical cluster into several distinct ones, much like virtual machines do.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1k34razhtr4v2t1dp3n8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1k34razhtr4v2t1dp3n8.png" alt=" " width="800" height="296"&gt;&lt;/a&gt;&lt;br&gt;
🖼️ Right click, don't even think too long.&lt;/p&gt;

&lt;p&gt;Only the essential Kubernetes components - the API server, controller manager, storage backend (such as etcd, sqlite, mysql, etc.), and - optionally - a scheduler—make up the virtual cluster itself. In order to minimize virtual cluster overhead, &lt;code&gt;vcluster&lt;/code&gt; builds by default on &lt;code&gt;k3s&lt;/code&gt;, a fully functional, certified, lightweight Kubernetes distribution that compiles the Kubernetes components into a single binary and disables by default all unnecessary Kubernetes features, such as the pod scheduler or specific controllers.&lt;/p&gt;

&lt;p&gt;Other Kubernetes distributions, &lt;a href="https://www.vcluster.com/docs/operator/other-distributions" rel="noopener noreferrer"&gt;such k0s and vanilla k8s&lt;/a&gt;, are supported in addition to k3s. In addition to the control plane, the virtual cluster also includes a Kubernetes hypervisor that simulates networking and worker nodes. Between the virtual and host clusters, this component syncs a few key resources that are crucial for cluster functionality:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pods&lt;/strong&gt;: All the virtual cluster's started pods are rewritten before being launched in the virtual cluster's namespace in the host cluster. Environment variables, DNS, service account tokens, and other configurations are updated to point to the virtual cluster rather than the host cluster. In the pod, it appears that the virtual cluster rather than the host cluster is where the pod is started.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Services&lt;/strong&gt;: On the namespace of the virtual cluster in the host cluster, all services and endpoints are rewritten and generated. The service cluster IPs are shared by the host cluster and virtual cluster. This implies that there are no performance consequences when a service in the host cluster is accessed from within the virtual cluster.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PersistentVolumeClaims&lt;/strong&gt;: In the event that persistent volume claims are generated in the virtual cluster, they will be modified and generated in the host cluster's namespace. The relevant persistent volume data will be synchronized back to the virtual cluster if they are bound in the host cluster.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ConfigMaps &amp;amp; Secrets&lt;/strong&gt;: Only ConfigMaps and secrets mounted to pods within the virtual cluster will be synced to the host cluster; all other ConfigMaps and secrets will only be retained within the virtual cluster.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Other Resources&lt;/strong&gt;: Deployments, StatefulSets, CRDs, service accounts, etc. do not sync with the host cluster; instead, they only reside in the virtual cluster.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Who lost the magic mirror?
&lt;/h2&gt;

&lt;p&gt;For each pod with the &lt;code&gt;spec.nodeName&lt;/code&gt; value it encounters inside the virtual cluster, &lt;code&gt;vcluster&lt;/code&gt; by default creates a &lt;em&gt;false&lt;/em&gt; node. Because &lt;code&gt;vcluster&lt;/code&gt; &lt;em&gt;does not&lt;/em&gt; by default &lt;em&gt;have RBAC permissions&lt;/em&gt; to access the real nodes in the host cluster because doing so would require a cluster role and cluster role binding, those &lt;em&gt;false&lt;/em&gt; nodes are produced. Additionally, each node will get a false &lt;code&gt;kubelet&lt;/code&gt; endpoint that will either send requests to the real node &lt;em&gt;or rewrite them&lt;/em&gt; to keep virtual cluster names intact.&lt;/p&gt;

&lt;p&gt;Vcluster supports multiple modes to customize node syncing behavior. For a detailed list of the resources that may have been synced, &lt;a href="https://www.vcluster.com/docs/architecture/synced-resources" rel="noopener noreferrer"&gt;see details here&lt;/a&gt; in the docs.&lt;/p&gt;

&lt;p&gt;The hypervisor also proxies some Kubernetes API calls, including pod port forwarding or container command execution, to the host cluster in addition to synchronizing virtual and host cluster resources. It essentially performs the function of the virtual cluster's reverse proxy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiay4g5cnqmdlgak8mxih.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiay4g5cnqmdlgak8mxih.png" alt=" " width="766" height="399"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To ensure proper network operation for the virtual cluster, resources like &lt;code&gt;Service&lt;/code&gt; and &lt;code&gt;Ingress&lt;/code&gt; are synced by default &lt;em&gt;from&lt;/em&gt; the virtual cluster [down] &lt;em&gt;to&lt;/em&gt; the host cluster.&lt;/p&gt;
&lt;h2&gt;
  
  
  There are never too many levels of abstraction
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4inkwzd1ss3rnja32wie.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4inkwzd1ss3rnja32wie.png" alt=" " width="711" height="331"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Certain resources (such as CRDs or RBAC policies) reside &lt;em&gt;cluster-wide&lt;/em&gt;, and you can’t isolate them using &lt;em&gt;namespaces&lt;/em&gt;. For instance, it is not feasible to install an operator simultaneously in multiple versions inside a same cluster.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ kubectl api-resources --namespaced=false|true
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Although Kubernetes itself already offers namespaces for various settings, their use of cluster-scoped resources and the control plane is constrained.&lt;/p&gt;

&lt;p&gt;In many circumstances, virtual clusters are also more stable than namespaces. In its own data store, the virtual cluster produces its own Kubernetes resource objects. These resources are unknown to the host cluster.&lt;/p&gt;

&lt;p&gt;This kind of isolation is good for resilience. The necessity for access to cluster-scoped resources like cluster roles, shared CRDs, or persistent volumes still exists for engineers who adopt namespace-based isolation. Each team that depends on one of these shared resources will probably experience failure if an engineer destroys something in it.&lt;/p&gt;

&lt;p&gt;Finally, virtual cluster configuration &lt;em&gt;is independent of physical&lt;/em&gt; cluster configuration. This is excellent for multi-tenancy because it allows you to easily create a fresh environment or amazing demo applications. 😎&lt;/p&gt;

&lt;h2&gt;
  
  
  How it looks in your CLI
&lt;/h2&gt;

&lt;p&gt;Create file &lt;code&gt;vcluster.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;vcluster&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rancher/k3s:v1.23.5-k3s1&lt;/span&gt;   &lt;span class="c1"&gt;# Choose k3s version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, install helm chart using vcluster.yaml for chart values:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm upgrade --install my-vcluster vcluster \
  --values vcluster.yaml \
  --repo https://charts.loft.sh \
  --namespace host-namespace-1 \
  --repository-config=''
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Access:
&lt;/h3&gt;

&lt;p&gt;Get the admin tool&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -s -L "https://github.com/loft-sh/vcluster/releases/latest" | sed -nE 's!.*"([^"]*vcluster-linux-amd64)".*!https://github.com\1!p' | xargs -n 1 curl -L -o vcluster &amp;amp;&amp;amp; chmod +x vcluster;
sudo mv vcluster /usr/local/bin;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, connect:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Connect and switch the current context to the vcluster
vcluster connect my-vcluster -n my-vcluster

# Switch back context
vcluster disconnect
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You have an option to create a separate &lt;code&gt;kubeconfig&lt;/code&gt; to use instead of changing the current context:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;vcluster connect my-vcluster --update-current=false
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or you may execute a command directly with &lt;code&gt;vcluster&lt;/code&gt; context without changing the &lt;em&gt;current&lt;/em&gt; context:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;vcluster connect my-vcluster -- kubectl get namespaces
vcluster connect my-vcluster -- bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Usage:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Run any kubectl, helm, etc. command in your vcluster
kubectl get namespace
kubectl get pods -n kube-system
kubectl create namespace demo-nginx
kubectl create deployment nginx-deployment -n demo-nginx --image=nginx
kubectl get pods -n demo-nginx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cleanup:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm delete my-vcluster -n vcluster-my-vcluster --repository-config=''
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What if you're planning some serious thing?
&lt;/h2&gt;

&lt;p&gt;Well, the stock K8s distribution &lt;em&gt;is compatible with high availability&lt;/em&gt; in &lt;code&gt;vcluster&lt;/code&gt;. What is meant by high availability? Well, one of the entities is to make &lt;code&gt;etcd&lt;/code&gt; database more robust. The second one is to boost syncer's performance. As mentioned above, &lt;code&gt;vcluster&lt;/code&gt; uses a so-called syncer which copies the pods that are created within the virtual cluster to the underlying host cluster. &lt;/p&gt;

&lt;p&gt;🪲&lt;strong&gt;TL;DR #1:&lt;/strong&gt; &lt;code&gt;etcd&lt;/code&gt; uses a leader-based consensus protocol for consistent data replication and log execution. Etcd cluster members elect a single leader, all other members become followers. The cluster elects a new leader automatically when one falls out of favor. Once the incumbent fails, the election does not take place immediately. Since the failure detection methodology is timeout based, electing a new leader takes roughly an election timeout. &lt;/p&gt;

&lt;p&gt;🪲&lt;strong&gt;TL;DR #2:&lt;/strong&gt; Why it is recommended to have &lt;em&gt;a minimum of three instances&lt;/em&gt; in an etcd cluster, is &lt;a href="https://etcd.io/docs/v3.5/faq/" rel="noopener noreferrer"&gt;well described here&lt;/a&gt;, a first hand info.&lt;/p&gt;

&lt;p&gt;Currently, vcluster's high availability setup does not allow single binary distributions like &lt;code&gt;k0s&lt;/code&gt; and &lt;code&gt;k3s&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Create a values.yaml with the following structure in order to operate &lt;code&gt;vcluster&lt;/code&gt; in high availability mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Enable HA mode&lt;/span&gt;
&lt;span class="na"&gt;enableHA&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="c1"&gt;# Scale up syncer replicas&lt;/span&gt;
&lt;span class="na"&gt;syncer&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;

&lt;span class="c1"&gt;# Scale up etcd&lt;/span&gt;
&lt;span class="na"&gt;etcd&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;

&lt;span class="c1"&gt;# Scale up controller manager&lt;/span&gt;
&lt;span class="na"&gt;controller&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;

&lt;span class="c1"&gt;# Scale up api server&lt;/span&gt;
&lt;span class="na"&gt;api&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;

&lt;span class="c1"&gt;# Scale up DNS server&lt;/span&gt;
&lt;span class="na"&gt;coredns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  To summarize
&lt;/h2&gt;

&lt;p&gt;A virtual Kubernetes cluster that is fully functioning can be built using &lt;code&gt;vcluster&lt;/code&gt;! The underlying K8s cluster's namespace is where each &lt;code&gt;vcluster&lt;/code&gt; operates. It provides better multi-tenancy and isolation than conventional namespaces, and it is less expensive than building independent, fully-fledged clusters.&lt;/p&gt;

&lt;p&gt;Virtual clusters can be a good option to running numerous instances of &lt;code&gt;k3s&lt;/code&gt;, or &lt;code&gt;k0s&lt;/code&gt; side by side, but they &lt;em&gt;cannot exist on their own without a host&lt;/em&gt; cluster. &lt;/p&gt;

&lt;p&gt;Compared to fully independent Kubernetes clusters, they are faster, lighter, and simpler to reach. Therefore, give using a virtual cluster a shot &lt;a href="https://komodor.com/learn/git-revert-rolling-back-in-gitops-and-kubernetes/" rel="noopener noreferrer"&gt;if you're tired&lt;/a&gt; of having to reset your local or CI/CD Kubernetes clusters all the time. However, this is a topic for a completely different story, much sadder than what you just read.&lt;/p&gt;

&lt;p&gt;Be in good &amp;amp; non-ghost shape! 👻&lt;/p&gt;

&lt;p&gt;Many thanks to Viktor 🐦@vfarcic Farcic and Mauricio 🐦&lt;a class="mentioned-user" href="https://dev.to/salaboy"&gt;@salaboy&lt;/a&gt; Salatino for inspiration!&lt;/p&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>productivity</category>
      <category>k3s</category>
    </item>
    <item>
      <title>How to Stop Rampant Kubernetes Cluster Growth</title>
      <dc:creator>Roman Belshevitz</dc:creator>
      <pubDate>Thu, 25 Aug 2022 18:20:00 +0000</pubDate>
      <link>https://dev.to/otomato_io/how-to-stop-rampant-kubernetes-cluster-growth-4eip</link>
      <guid>https://dev.to/otomato_io/how-to-stop-rampant-kubernetes-cluster-growth-4eip</guid>
      <description>&lt;h2&gt;
  
  
  Some lyrics as an introduction
&lt;/h2&gt;

&lt;p&gt;The Edvard Munch's famous painting "Scream" was first presented to the public at the Berlin exhibition in December 1893. It was conceived as part of the &lt;a href="https://www.dailyartmagazine.com/edvard-munch-and-the-frieze-of-life/" rel="noopener noreferrer"&gt;"Frieze of Life"&lt;/a&gt; - a program cycle of paintings about the spiritual life of a person. Munch wrote about him: &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“The Frieze of Life” is conceived as a series of paintings connected with each other, which together should give a description of a whole life. A winding line of the coast passes through the picture, behind it is the sea, it is always in motion, and under the crowns of trees there is a diverse life with its sorrows and joys. Frieze is conceived as a poem about life, love and death.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The author of this brief, of course, will not talk about the spiritual life, but about practical approaches that prevent thoughts about the terrible and otherworldly and save the nerves of engineers.&lt;/p&gt;

&lt;h2&gt;
  
  
  The essence of the Ops' problem
&lt;/h2&gt;

&lt;p&gt;Kubernetes was originally designed to support the consolidation of workloads on a &lt;em&gt;single&lt;/em&gt; cluster. However, there are many problematic scenarios that require a &lt;em&gt;multi-cluster approach&lt;/em&gt; to optimize performance. These may include workloads across regions, fault propagation radius limits, compliance issues, harsh multi-user environments, security, and custom software solutions.&lt;/p&gt;

&lt;p&gt;Unfortunately, this multi-cluster approach poses management challenges, as the complexity of managing a Kubernetes cluster only increases as the size of the cluster increases. The end result is a phenomenon called &lt;em&gt;cluster sprawl&lt;/em&gt;, which occurs when the number of clusters and workloads grows and is not managed coherently.&lt;/p&gt;

&lt;p&gt;The solution to this problem lies in the early and rapid identification and implementation of the best management practices in order to avoid serious work in the future.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Kubernetes governance?
&lt;/h2&gt;

&lt;p&gt;In order to ensure accountability, transparency, and responsibility, a well-defined collection of rules, policies, and procedures is referred to as governance.&lt;/p&gt;

&lt;p&gt;Governance is also about synchronizing clusters and providing centralized policy management. Kubernetes' governance is defined as a set of rules created with policies that need to be enforced across all clusters. This is a critical component for large enterprises running Kubernetes.&lt;/p&gt;

&lt;p&gt;Typically, this process means applying matching rules across Kubernetes multi-clusters, as well as applications running in those clusters. And while managing Kubernetes may seem insignificant, it pays off in the long run, especially if implemented in a large organization.&lt;/p&gt;

&lt;p&gt;Assume that the enterprise continues to increase the number of clusters in use and does not apply management. These clusters will exist under different rules, which will create a huge amount of extra work for the teams in the near future.&lt;/p&gt;

&lt;p&gt;Fortunately, there are only a few very important components to building a successful Kubernetes governance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating successful Kubernetes governance
&lt;/h2&gt;

&lt;p&gt;When considering a successful Kubernetes governance strategy, the first component is to ensure good multi-cluster management and monitoring. You must maintain control over how and where clusters are created and configured, as well as which software versions can be used.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frp9fax3ns5gh9szpzt9u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frp9fax3ns5gh9szpzt9u.png" alt=" " width="500" height="376"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  🔧 Well-built observabilty
&lt;/h3&gt;

&lt;p&gt;Application development and operations teams should be able to centrally view and manage clusters to better optimize resources and troubleshoot. Solutions in this area are developed, for example, by &lt;a href="https://www.redhat.com/en/technologies/management/advanced-cluster-management" rel="noopener noreferrer"&gt;Red Hat&lt;/a&gt;, &lt;a href="https://platform9.com/blog/eks-plug-and-play-centralized-management-of-your-applications-across-aws-eks-clusters/" rel="noopener noreferrer"&gt;Platform9&lt;/a&gt;,  &lt;a href="https://polaris.docs.fairwinds.com/" rel="noopener noreferrer"&gt;Fairwinds&lt;/a&gt; and even &lt;a href="https://github.com/rancher/opni" rel="noopener noreferrer"&gt;Rancher Labs&lt;/a&gt;. Improved management practices and greater transparency can also save a company from the headaches of a range of security risks and performance issues down the road.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔧 RBAC strategies
&lt;/h3&gt;

&lt;p&gt;Next, enterprises must have an authentication and access control system in place. Having centralized authentication and authorization will help an organization streamline the login process and help keep track of user activity. This will allow application development and operations teams &lt;a href="https://www.techtarget.com/searchitoperations/tutorial/Be-selective-with-Kubernetes-RBAC-permissions" rel="noopener noreferrer"&gt;to ensure that the right people&lt;/a&gt; are doing important tasks in real time.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔧 Policy management
&lt;/h3&gt;

&lt;p&gt;Finally, to govern Kubernetes, enterprises must optimize policy management. Companies need to think about how Kubernetes will impact their development culture and work on finding the right balance of business agility and development. Ultimately, governance (with the appropriate level of flexibility) ensures that businesses can meet customer needs and deploy mission-critical services in a consistent and consistent manner.&lt;/p&gt;

&lt;p&gt;In Kubernetes, Admission Controllers enforce policies on objects during create, update, and delete operations. Admission control is fundamental to policy enforcement in Kubernetes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://kubernetes.io/blog/2019/03/21/a-guide-to-kubernetes-admission-controllers/" rel="noopener noreferrer"&gt;Admission controllers&lt;/a&gt; allow you to enforce the adherence to certain practices such as having good labels, annotations, resource limits, or other settings.&lt;/p&gt;

&lt;p&gt;Being the CNCF project, Open Policy Agent (OPA) is a great tool to develop and implement such policies at scale throughout an organization. Every request will go through the OPA, as illustrated below, and will be decided depending on the policies established for the Kubernetes cluster. The request will be carried out if it complies with the policy. The OPA will reject the request if it violates the established policies.&lt;/p&gt;

&lt;p&gt;As a good practice, by &lt;a href="https://www.openpolicyagent.org/docs/latest/kubernetes-introduction/#how-does-it-work-with-plain-opa-and-kube-mgmt" rel="noopener noreferrer"&gt;deploying OPA&lt;/a&gt; as an admission controller, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Require specific labels on all resources.&lt;/li&gt;
&lt;li&gt;Require container images come from the corporate image registry.&lt;/li&gt;
&lt;li&gt;Require all pods specify resource requests and limits.&lt;/li&gt;
&lt;li&gt;Prevent conflicting Ingress objects from being created.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgjlt13feinocj5p1hkmq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgjlt13feinocj5p1hkmq.png" alt=" " width="789" height="359"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Goals to achieve
&lt;/h2&gt;

&lt;p&gt;But what should be the goals of governance? Where should it be enforced and tested? The four most effective management objectives are security policy, network management, access control, and image management. Let's look at each of these goals one by one:&lt;/p&gt;

&lt;h3&gt;
  
  
  🎯 Security policy
&lt;/h3&gt;

&lt;p&gt;In security policies for governing Kubernetes, it is important to restrict user access to pods in clusters. Cluster users should have well-defined access based on their role.&lt;/p&gt;

&lt;p&gt;To do this, enterprises must implement a security policy that will have rules and conditions related to access and privileges. In this policy, they must specify that containers have read-only access to the file system and that containers and child processes cannot be subject to privilege changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  🎯 Network management
&lt;/h3&gt;

&lt;p&gt;Network policy plays a very important role in determining which services can communicate with each other. Here, companies must determine which modules and services can interact with each other and which should be isolated. This also applies to module security in Kubernetes management.&lt;/p&gt;

&lt;p&gt;The right approach is aimed at controlling traffic within Kubernetes clusters. This approach can be based on modules, namespaces, or IPs, depending on management requirements.&lt;/p&gt;

&lt;p&gt;Each popular CNI plugin uses a different type of configuration for the network setup. For example, &lt;a href="https://projectcalico.docs.tigera.io/networking/determine-best-networking" rel="noopener noreferrer"&gt;Calico&lt;/a&gt; uses layer 3 networking paired with the BGP routing protocol to connect pods.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/otomato_io/cilium-ebpf-powered-cni-a-nos-solution-for-modern-clouds-1hl1"&gt;Cilium&lt;/a&gt; configures an overlay network with eBPF on layers 3 to 7. Along with Calico, Cilium supports setting up network policies to restrict traffic.&lt;/p&gt;

&lt;h3&gt;
  
  
  🎯 Administration and access control
&lt;/h3&gt;

&lt;p&gt;In access control, when configuring role-based access control (RBAC) policy, administrators need to restrict access to cluster resources. Using Kubernetes objects such as &lt;code&gt;Role&lt;/code&gt;, &lt;code&gt;ClusterRole&lt;/code&gt;, &lt;code&gt;RoleBinding&lt;/code&gt;, and &lt;code&gt;ClusterRoleBinding&lt;/code&gt;, they need to fine-tune access to cluster resources appropriately.&lt;/p&gt;

&lt;p&gt;Because permissions granted by a &lt;code&gt;ClusterRole&lt;/code&gt; apply across the entire cluster, you can use &lt;code&gt;ClusterRole&lt;/code&gt;s to control access to different kinds of resources than you can with &lt;code&gt;Role&lt;/code&gt;s. These include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cluster-scoped resources such as nodes&lt;/li&gt;
&lt;li&gt;Non-resource REST Endpoints &lt;a href="https://kubernetes.io/docs/reference/using-api/health-checks/" rel="noopener noreferrer"&gt;such as&lt;/a&gt; &lt;code&gt;/healthz&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Namespaced&lt;/em&gt; resources across all Namespaces (for example, all Pods across the entire cluster, regardless of Namespace).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After creating a &lt;code&gt;Role&lt;/code&gt; or &lt;code&gt;ClusterRole&lt;/code&gt;, &lt;a href="https://learnk8s.io/rbac-kubernetes" rel="noopener noreferrer"&gt;you have to assign it&lt;/a&gt; to a user or group of users by creating a &lt;code&gt;RoleBinding&lt;/code&gt; or &lt;code&gt;ClusterRoleBinding&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rbac.authorization.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterRoleBinding&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;testadminclusterbinding&lt;/span&gt;
&lt;span class="na"&gt;subjects&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceAccount&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;myaccount&lt;/span&gt;
    &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test&lt;/span&gt;
&lt;span class="na"&gt;roleRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterRole&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cluster-admin&lt;/span&gt;
  &lt;span class="na"&gt;apiGroup&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rbac.authorization.k8s.io&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  🎯 Image management
&lt;/h3&gt;

&lt;p&gt;Using public Docker images can increase the speed and flexibility of application development, but there are many vulnerable Docker images, and using them in a production cluster can be very risky.&lt;/p&gt;

&lt;p&gt;Image management is also part of Kubernetes governance. All images that will be used in the cluster must be pre-scanned for vulnerabilities. There are several approaches to finding vulnerabilities. How and where an organization checks for vulnerabilities depends on its preferred workflows. However, it is recommended that you test your images before deploying them to a cluster.&lt;/p&gt;

&lt;p&gt;Hacker activity has increased exponentially in recent years, and loopholes in systems continue to be discovered. Therefore, it is very important for companies to be vigilant when implementing practices to ensure that they only use official, clean, and verified Docker images on a cluster.&lt;/p&gt;

&lt;p&gt;Threat actors can mount sophisticated assaults employing previously dependable third-party artifacts as an attack vector by using malicious scripts or malware concealed in a container image. Static, pattern-based, or signature-based scanners are not effective against this kind of attack because it only appears during runtime.&lt;/p&gt;

&lt;p&gt;By evaluating the attack kill chain and running images in a secure hosted sandbox environment, several security solutions can reduce this risk. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwa3wpii8ine0jptfqydk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwa3wpii8ine0jptfqydk.png" alt=" " width="800" height="272"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To examine images in a running state both before and after the image is checked into a registry, these tools, i.e. &lt;a href="https://github.com/aquasecurity/trivy" rel="noopener noreferrer"&gt;trivy&lt;/a&gt; by Aqua Security, are frequently &lt;a href="https://github.com/aquasecurity/trivy-action" rel="noopener noreferrer"&gt;incorporated into CI/CD&lt;/a&gt; processes. Malicious behavior and unfulfilled policy requirements can mark an image for registry deletion or prevent check-in entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  Instead of conclusion
&lt;/h2&gt;

&lt;p&gt;Thus, the author has given in brief the directions needed to better govern Kubernetes and ensure the security of important enterprise systems and data, as well as to limit cluster growth and possible disorder. Stay strong and focused!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The author is thankful to Arthur Chiao, Oleg Chunikhin (CNCF), Tomas Fernandez (Rendered Text / Semaphore), Mike Jordan (Coredge), Kristijan Mitevski and Steven Zimmerman (Aqua Security) for their contribution to comunity.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>profuctivity</category>
      <category>team</category>
    </item>
  </channel>
</rss>
