<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Hossein Zolfi</title>
    <description>The latest articles on DEV Community by Hossein Zolfi (@empire).</description>
    <link>https://dev.to/empire</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1403307%2F73075bb6-daee-4ae1-ac7a-73703407f93b.jpeg</url>
      <title>DEV Community: Hossein Zolfi</title>
      <link>https://dev.to/empire</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/empire"/>
    <language>en</language>
    <item>
      <title>The Hidden Tax You Are Paying Every Day: A Dev's Journey into Ops Automation</title>
      <dc:creator>Hossein Zolfi</dc:creator>
      <pubDate>Sun, 23 Nov 2025 05:09:02 +0000</pubDate>
      <link>https://dev.to/empire/the-hidden-tax-you-are-paying-every-day-a-devs-journey-into-ops-automation-5f80</link>
      <guid>https://dev.to/empire/the-hidden-tax-you-are-paying-every-day-a-devs-journey-into-ops-automation-5f80</guid>
      <description>&lt;p&gt;Starting a new career is kinda like walking into a workshop full of tools you've never seen before. You know they're useful, maybe even powerful, but honestly... you're not totally sure what they do or whether you'll break something the moment you touch them. That's me right now. Everything feels new. And confusing. And fun. Sometimes all at once.&lt;/p&gt;

&lt;p&gt;This article is part of a series where I'm basically thinking out loud about the stuff I'm learning. I want my future self to read these later and go like: "Oh wow, that's where I started." Or maybe he'll laugh at me. Either way, it's worth documenting.&lt;/p&gt;

&lt;p&gt;Along the way I ran into a bunch of questions. Some I answered. Some I couldn't. Some probably don't even have a real answer. I'm gonna put them here anyway because they're part of the journey.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1: Identify the Problem (aka "Why am I doing this manually??")
&lt;/h2&gt;

&lt;p&gt;One of the things I'm struggling with is: how do I share operations with my future self? Suppose I want to set up a client on a machine that wants to ship logs to external servers, how should I do it? How should I remember what I've done before? Reading documentation or command history to find out what I did before? Seriously. What about urgent situations: What happens if we lose a VM? How do I know what's even on these machines?&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;💰 The CEO/Manager ROI Corner&lt;/p&gt;

&lt;p&gt;If you are a manager reading this, here is the math: Manual ops is a hidden tax. Every hour I spend SSH-ing into a box to fix a config is an hour I am not building features. Plus, manual changes mean human error, which means downtime risk. Automation isn't just "cool", it is risk management.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Coming back to the ops world after doing dev work, my first thought when I saw these GCP instances was: &lt;strong&gt;how do I get rid of manual work.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Throughout my entire life as a software engineer, I've never been able to persuade myself to do repetitive work manually. I just... I can't. I hate it. If I have to do the same clicks and commands more than twice, I literally start to feel dizziness. It just feels wrong.&lt;/p&gt;

&lt;p&gt;This is basically the &lt;a href="https://en.wikipedia.org/wiki/Don%27t_repeat_yourself" rel="noopener noreferrer"&gt;DRY Principle (Don't Repeat Yourself)&lt;/a&gt; applied to infrastructure.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Every piece of knowledge must have a single, unambiguous, authoritative representation within a system&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To save time in these kinds of situations the obvious answer is docs. Notion, Confluence, Google Docs, whatever. But... why? Just to tell other people (technical or not) what's installed? And how I set it up? Especially since not everyone can just ssh in and check.&lt;/p&gt;

&lt;p&gt;That just doesn't work for me. We're software developers. We're supposed to like code, right? We live in it. And for me, running a command from a script is always safer and faster than typing it manually. If someone else is doing this manual work, it isn't a reason I should do it. So in the middle of my work and tasks, I explore my options.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 2: Exploring Options
&lt;/h2&gt;

&lt;p&gt;The real answer, for me, has to involve Git. A repo.&lt;/p&gt;

&lt;p&gt;My mind immediately went to Ansible. But I did pause for a second to think about the other players in the room.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Ansible&lt;/strong&gt; is open-source, command-line IT automation software written in Python. It can configure systems, deploy software, and orchestrate advanced workflows to support application deployment, system updates, and more.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's exactly what I need.&lt;/p&gt;

&lt;p&gt;Here’s a quick mental matrix I went through:&lt;br&gt;
&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;The Vibe&lt;/th&gt;
&lt;th&gt;Why I picked (or didn't pick) it&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Terraform&lt;/td&gt;
&lt;td&gt;The "Industry Standard" for infra. Great for creating VMs, less great for configuring inside them.&lt;/td&gt;
&lt;td&gt;Overkill for right now. I have the VMs; I need to configure the OS. The learning curve is steep when I just need to install Docker.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pulumi&lt;/td&gt;
&lt;td&gt;"Infrastructure as Code" but actual code (TS/Python).&lt;/td&gt;
&lt;td&gt;Super cool, but adds complexity I don't need yet. I want simple config files, not a compile step.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ansible&lt;/td&gt;
&lt;td&gt;The "Old Reliable". Agentless, runs over SSH.&lt;/td&gt;
&lt;td&gt;Winner. It uses YAML, handles server configuration perfectly, and I already know the basics.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
Decision Point: Why Ansible?








&lt;blockquote&gt;
&lt;p&gt;To be honest, it's just what I know. I used it years ago, so I'm familiar with it. My focus right now wasn’t to learn a brand-new provisioning tool. That just wasn't aligned with my career goals at this moment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; When you are a solo dev or a small team, &lt;strong&gt;Velocity &amp;gt; Perfection&lt;/strong&gt;. Always.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So, good or not, I chose Ansible. It's my baseline. It'll tell me what I set up, and it'll help me install services in minutes instead of hours (like webhooks or configuring logs for Loki — should that take hours? No. But the context-switching kills you). And we can place the configuration in code and commit it in a Git repo.&lt;/p&gt;




&lt;h2&gt;
  
  
  Implementation: OK, let's do this
&lt;/h2&gt;

&lt;p&gt;So, I chose Ansible. Now. How do I use it to provision and configure machines in GCP?&lt;/p&gt;

&lt;p&gt;This is where it got weird. The GCP infra is really different from what I'm used to. The network access is super abstract and hidden. The OS runs differently (it has agents! that configure things from the web console!).&lt;/p&gt;

&lt;p&gt;And everything is about regions and zones. You've seen the docs:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Regions are independent geographic areas that consist of zones. A zone is a deployment area within a region.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Yada yada. Why do I care? &lt;strong&gt;Cost&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This stuff isn't free. If you're self-hosting, you just stack your servers in one datacenter. In GCP, if you have a machine in Europe and one in North America, you're &lt;a href="https://cloud.google.com/vpc/network-pricing?hl=en" rel="noopener noreferrer"&gt;paying&lt;/a&gt; for that data transfer (like $0.05 per GB).&lt;/p&gt;

&lt;p&gt;Even external IPs cost money! A static IP in Belgium is $3.65/month. It's not much, but for 10 VMs? That adds up. Even if the machine has an external IP to connect, its IP address can be ephemeral! It means if a machine stops and starts (not restarts) GCP will assign a new external IP address. Just wanting to have an external IP doesn't mean I should reserve one.&lt;/p&gt;

&lt;p&gt;This leads to a simple conclusion:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you don't need an external IP, DON'T ASSIGN ONE.&lt;/strong&gt; It's just wasted money.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 3: The Actual Problem
&lt;/h2&gt;

&lt;p&gt;Okay, so if I follow my own advice and don't assign an external IP how the hell does Ansible connect to it? Ansible needs an IP in its inventory file. Where can I find the IP address? Am I wrong to choose Ansible as my provisioning tool? So I have Ansible — how can I configure machines with it?&lt;/p&gt;

&lt;p&gt;This is where the &lt;strong&gt;gcloud&lt;/strong&gt; command comes in. GCP gives you this tool to manage everything. You can connect to a machine with it even without a public IP:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud compute ssh demo &lt;span class="nt"&gt;--zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;europe-north1-a &lt;span class="nt"&gt;--project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;my-project
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's basically just magic. It tunnels through or something. I don't care how right now, just that it works.&lt;/p&gt;

&lt;p&gt;So, if gcloud can ssh maybe &lt;strong&gt;Ansible can use gcloud instead of ssh&lt;/strong&gt;?&lt;/p&gt;




&lt;h2&gt;
  
  
  The Solution: A Wrapper Script
&lt;/h2&gt;

&lt;p&gt;The idea: we tell Ansible:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"When you want to ssh, run this script instead."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ansible.cfg&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[ssh_connection]&lt;/span&gt;
&lt;span class="py"&gt;ssh_executable&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;gcp-ssh-wrapper.sh&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we need to build &lt;code&gt;gcp-ssh-wrapper.sh&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The script receives all the arguments Ansible passes to SSH and instead passes them to &lt;code&gt;gcloud compute ssh&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The core looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;exec &lt;/span&gt;gcloud compute ssh &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;[@]&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;host&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tunnel-through-iap&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--no-user-output-enabled&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;zone&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;project&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="nt"&gt;-C&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;cmd&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To get the host and command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="p"&gt;@&lt;/span&gt;:&lt;span class="p"&gt; -2&lt;/span&gt;:1&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nv"&gt;cmd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="p"&gt;@&lt;/span&gt;:&lt;span class="p"&gt; -1&lt;/span&gt;:1&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To collect SSH options:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;declare&lt;/span&gt; &lt;span class="nt"&gt;-a&lt;/span&gt; opts
&lt;span class="k"&gt;for &lt;/span&gt;ssh_arg &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="p"&gt;@&lt;/span&gt;:1:&lt;span class="nv"&gt;$#-3&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  if&lt;/span&gt; &lt;span class="o"&gt;[[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ssh_arg&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="nt"&gt;--&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="o"&gt;]]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;opts+&lt;span class="o"&gt;=(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ssh_arg&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;fi
done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But... wait. &lt;code&gt;gcloud&lt;/code&gt; needs zone and project. Ansible doesn't know about them. So the script has to ask Ansible:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;ansible-inventory &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;inventory_path&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--host&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;host&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; |
    jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.zone'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;ansible-inventory &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;inventory_path&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--host&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;host&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; |
    jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.project'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;💾 Grab the Code&lt;br&gt;
Instead of copy-pasting the snippets above and hoping for the best, I've uploaded the final, working &lt;code&gt;gcp-ssh-wrapper.sh&lt;/code&gt; script &lt;a href="https://gist.github.com/empire/00e33a42cf12d1ca7148b5784dc18c4d" rel="noopener noreferrer"&gt;to this GitHub Gist&lt;/a&gt;. Download it, chmod +x it, and you are good to go.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Inventory
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="c"&gt;# inventory/inventory.ini
&lt;/span&gt;&lt;span class="nn"&gt;[project1]&lt;/span&gt;
&lt;span class="err"&gt;demo&lt;/span&gt; &lt;span class="py"&gt;zone&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;europe-north1-a&lt;/span&gt;

&lt;span class="nn"&gt;[project1:vars]&lt;/span&gt;
&lt;span class="py"&gt;project&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;my-project&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now it works.&lt;/p&gt;




&lt;h2&gt;
  
  
  Let's test it
&lt;/h2&gt;

&lt;p&gt;Playbook:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Configure Docker logging&lt;/span&gt;
  &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;all&lt;/span&gt;
  &lt;span class="na"&gt;become&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;gather_facts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

  &lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Template Docker daemon configuration&lt;/span&gt;
      &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;src&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;templates/daemon.json.j2&lt;/span&gt;
        &lt;span class="na"&gt;dest&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/etc/docker/daemon.json&lt;/span&gt;
        &lt;span class="na"&gt;owner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;root&lt;/span&gt;
        &lt;span class="na"&gt;group&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;root&lt;/span&gt;
        &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;0644'&lt;/span&gt;
      &lt;span class="na"&gt;notify&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Restart Docker&lt;/span&gt;
      &lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker&lt;/span&gt;

  &lt;span class="na"&gt;handlers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Restart Docker&lt;/span&gt;
      &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker&lt;/span&gt;
        &lt;span class="na"&gt;state&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;restarted&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Template:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;%&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;loki_enabled&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"log-driver"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"loki"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"log-opts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"loki-url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"{{ loki_url }}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"loki-batch-size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"400"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;%&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;else&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"log-driver"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"json-file"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"log-opts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"max-size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"300m"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"max-file"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;%&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;endif&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Vars:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;loki_enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;loki_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://10.166.0.20:3100/loki/api/v1/push"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ansible-playbook &lt;span class="nt"&gt;-i&lt;/span&gt; inventory/ &lt;span class="nt"&gt;--diff&lt;/span&gt; &lt;span class="nt"&gt;-l&lt;/span&gt; demo playbook_docker.yml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Important Note on Refinement
&lt;/h2&gt;

&lt;p&gt;Before you copy-paste &lt;a href="https://gist.github.com/empire/00e33a42cf12d1ca7148b5784dc18c4d" rel="noopener noreferrer"&gt;this&lt;/a&gt; into production and yell at me: I know this is a &lt;a href="https://en.wikipedia.org/wiki/Proof_of_concept" rel="noopener noreferrer"&gt;Proof of Concept&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I'm aware that querying &lt;code&gt;ansible-inventory&lt;/code&gt; twice per connection is a performance bottleneck. I also know a true "Enterprise" setup would use the official &lt;code&gt;google.cloud.gcp_compute&lt;/code&gt; plugin for dynamic inventory and leverage Service Accounts for CI/CD.&lt;/p&gt;

&lt;p&gt;However, this approach trades 'Best Practice' complexity for immediate velocity. When you are a solo dev just trying to ship features, that is a trade worth making.&lt;/p&gt;

&lt;p&gt;That said, here is my &lt;a href="https://en.wikipedia.org/wiki/Technical_debt" rel="noopener noreferrer"&gt;Technical Debt&lt;/a&gt; list to tackle later:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Speed:&lt;/strong&gt; Caching the inventory lookups to stop spawning processes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scale:&lt;/strong&gt; Handling multi-region/project setups without hardcoded zones.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Resilience:&lt;/strong&gt; Better error handling for when &lt;code&gt;gcloud&lt;/code&gt; hiccups.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;So, the job is... done? Kinda.&lt;/p&gt;

&lt;p&gt;I'm left with a few more questions to explore in the next part of this series:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Magic Tunnel:&lt;/strong&gt; How exactly does IAP (Identity-Aware Proxy) work under the hood?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Name Collisions:&lt;/strong&gt; What if two machines in different projects share the same name?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The "Cloudflare" Layer:&lt;/strong&gt; There are some blackbox areas regarding how this interacts with Cloudflare that I need to document.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Anyway, that's where I am. It's a hack, but it's my hack, and it works.&lt;/p&gt;

&lt;p&gt;This article took nearly a month to write, test, and clean up. I'm really glad I started using Ansible. it's solving problems I didn't even know I had. If you've struggled with similar manual-ops dizziness, let me know in the comments.&lt;/p&gt;

</description>
      <category>softwareengineering</category>
      <category>devops</category>
      <category>career</category>
      <category>automation</category>
    </item>
    <item>
      <title>Too Much Talent: Why Building a Great Team Isn't Just About Seniority</title>
      <dc:creator>Hossein Zolfi</dc:creator>
      <pubDate>Tue, 13 May 2025 11:35:01 +0000</pubDate>
      <link>https://dev.to/empire/too-much-talent-why-building-a-great-team-isnt-just-about-seniority-2126</link>
      <guid>https://dev.to/empire/too-much-talent-why-building-a-great-team-isnt-just-about-seniority-2126</guid>
      <description>&lt;h2&gt;
  
  
  From Stars to Team
&lt;/h2&gt;

&lt;p&gt;Before I became an engineering manganager and when I was just reading software engineering books, I believed the surest path to success was hiring as many top-tier engineers as possible. My logic was simple: great software design comes from great engineers. A colleague once told me, &lt;em&gt;"Hire stars—they'll build game-changing features!"&lt;/em&gt; And often, they did. These engineers introduced powerful designs, elegant abstractions, and impressive capabilities that elevated our products.&lt;/p&gt;

&lt;p&gt;But over time, I saw the limits of that approach. Debates dragged on. Unpopular tasks went unowned. Engineers left because they felt underutilized. After reflecting on what I saw in my own teams—and what I learned from others—I realized:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Focusing solely on individual brilliance—even when it delivers great features—often sacrifices teamwork, consistency, and long-term growth.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The real magic happens when the team works as a cohesive unit. The goal isn't to collect stars—it's to build a constellation.&lt;/p&gt;




&lt;p&gt;When building teams—whether in sports or software—the instinct is to stack the roster with as much talent as possible. More senior engineers, more superstars, more wins, right? Not always. Research and real-world experience suggest a more nuanced truth: too much talent can actually hinder performance. Even when individual engineers introduce brilliant features, the lack of balance and collaboration can slow the team down. Here's why—and how to build a team that thrives.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Research: Talent Has Diminishing Returns
&lt;/h2&gt;

&lt;p&gt;A 2014 study by Swaab, Schaerer, Anicich, and Ronay analyzed highly interdependent team sports like soccer and basketball. They found that adding talent improves performance—up to a point. Beyond that, performance suffers. Too many stars can lead to coordination breakdowns, ego clashes, and poor collective results.&lt;/p&gt;

&lt;p&gt;This challenges the assumption that talent and output scale linearly. In interdependent teams—like most software engineering groups—excessive star power can actually disrupt the very collaboration needed to succeed.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key takeaway:&lt;/strong&gt; Talent drives performance, but only if it doesn't undermine teamwork.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Why This Matters for Software Teams
&lt;/h2&gt;

&lt;p&gt;Software engineering is a team sport. Engineers must align on architecture, review each other's work, and ship features together. A team made up solely of senior engineers or high-performers can run into serious problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Decision Gridlock:&lt;/strong&gt; Strong opinions may clash over tech choices or system design, delaying progress.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Neglected Execution:&lt;/strong&gt; Critical but unglamorous tasks—like CI/CD setup, writing documentation, or fixing edge-case bugs—may get ignored.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Leadership Friction:&lt;/strong&gt; Too many leaders without clear roles often compete instead of collaborate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Blocked Growth:&lt;/strong&gt; Without mid-level or junior engineers, there's no natural path for mentorship or long-term team development.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Even when talented engineers ship great features, a lack of cohesion or clarity in roles can erode team efficiency and morale.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Real-World Examples: Talent Alone Isn't Enough
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Google's Early Struggles
&lt;/h3&gt;

&lt;p&gt;In the early 2010s, Google's engineering teams were packed with elite talent. Yet, some teams struggled to ship:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Endless architecture debates delayed progress.&lt;/li&gt;
&lt;li&gt;Glue work like documentation and infrastructure lagged behind.&lt;/li&gt;
&lt;li&gt;Junior engineers felt sidelined, which hurt morale and retention.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Google eventually recognized that technical brilliance wasn't enough. They began emphasizing soft skills—collaboration, humility, team orientation—in hiring and promotions. They also formalized the role of the Tech Lead to create alignment and speed up decisions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Node.js and Open Source Governance
&lt;/h3&gt;

&lt;p&gt;Node.js, in its early days, suffered from too many strong voices pushing in different directions. Conflicts and lack of governance stalled progress. The project rebounded only after implementing clearer roles and decision-making processes. This empowered contributors at all levels and helped the project scale.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;These examples show that raw talent is not a silver bullet. Without structure and shared purpose, even the best engineers can struggle to work effectively together.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Role of Tech Leads in Balancing Talent
&lt;/h2&gt;

&lt;p&gt;A well-balanced team still needs guidance. That's where Tech Leads come in. The Tech Lead isn't the smartest person in the room—they're the one who keeps the room aligned.&lt;/p&gt;

&lt;p&gt;Before I became a team lead, I didn't fully understand this role. In my past companies, the Tech Lead was treated as the boss—someone who went to meetings, made decisions alone, and became disconnected from the team. Often, these leads lacked strong software instincts and sometimes introduced friction with senior engineers. This was far from the kind of leadership that fosters a healthy engineering culture.&lt;/p&gt;

&lt;p&gt;Now I realize:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;A Tech Lead is not a boss. A good Tech Lead is a coordinator and a listener.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The role is about creating alignment—not control. It's about ensuring that everyone contributes, conflicts are resolved constructively, and decisions move forward without dragging the team down.&lt;/p&gt;

&lt;p&gt;An effective Tech Lead:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Guides Without Dominating:&lt;/strong&gt; They set direction while making space for team input.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resolves Conflict:&lt;/strong&gt; They facilitate decisions so the team doesn't stall.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Leverages All Levels:&lt;/strong&gt; They ensure seniors mentor, mid-levels execute, and juniors grow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Focuses on Outcomes:&lt;/strong&gt; They prioritize what's best for the product—not their own technical legacy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Spotify's Squad Leads operate in this mold, helping cross-functional teams stay aligned while maintaining speed and cohesion.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Without a Tech Lead—or with one driven by ego—teams of superstars often descend into dysfunction.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  How to Build a Balanced Team: Practical Tips
&lt;/h2&gt;

&lt;p&gt;If talent alone isn't the answer, what is? Here are some actionable ways to build a high-performing team:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hire T-Shaped Engineers:&lt;/strong&gt; Seek people with deep expertise in one area, but enough breadth to collaborate across functions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Assess Gaps Before Hiring:&lt;/strong&gt; Need innovation? Hire a senior designer or architect. Need delivery? Mid-levels are crucial. Need culture? Add a mentor.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prioritize Soft Skills:&lt;/strong&gt; Interview for humility, communication, and collaboration—not just algorithms and system design.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Foster Mentorship:&lt;/strong&gt; Pair seniors with juniors. This builds culture and spreads knowledge.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Empower a Tech Lead:&lt;/strong&gt; Choose someone who values team success over individual heroics. Support their growth with training in facilitation and leadership.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  A Fun Metaphor: The LeBron James Problem
&lt;/h2&gt;

&lt;p&gt;You might call this the &lt;strong&gt;LeBron James Problem&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You'll wait forever trying to hire three LeBron Jameses—they're rare and expensive.&lt;/li&gt;
&lt;li&gt;Even if you succeed, they may clash instead of cooperate.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Software teams aren't basketball fantasy drafts. Obsessing over rockstar hires wastes time and breeds dysfunction. A team of role players with clear leadership often outperforms a group of misaligned all-stars.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts: Build a Constellation, Not a Cluster of Stars
&lt;/h2&gt;

&lt;p&gt;Talent absolutely matters. Superstar engineers can introduce groundbreaking features. But once you reach a certain threshold, adding more talent without balancing the team dynamic can backfire.&lt;/p&gt;

&lt;p&gt;To build a truly great team:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Seek balance:&lt;/strong&gt; Mix senior, mid-level, and junior engineers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prioritize teamwork:&lt;/strong&gt; Hire for compatibility as well as capability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Support leadership:&lt;/strong&gt; Use Tech Leads to align the team and resolve friction.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Value the whole:&lt;/strong&gt; Celebrate team outcomes over individual accomplishments.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Don't build a cluster of lone stars. Build a constellation—where everyone shines together.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>softwareengineering</category>
      <category>teambuilding</category>
      <category>leadership</category>
    </item>
    <item>
      <title>Remote debugging Go App</title>
      <dc:creator>Hossein Zolfi</dc:creator>
      <pubDate>Sat, 12 Oct 2024 13:59:06 +0000</pubDate>
      <link>https://dev.to/empire/remote-debugging-go-app-1nml</link>
      <guid>https://dev.to/empire/remote-debugging-go-app-1nml</guid>
      <description>&lt;p&gt;For the longest time, I wasn’t a fan of debuggers. Coming from a background in Spring Framework, Java, Python, and PHP (Symfony/Laravel), I always found logs and traces more reliable for debugging. I had even dabbled with GDB in the early days, but it didn’t stick. Instead, I relied on logging to figure out what my applications were doing.&lt;/p&gt;

&lt;p&gt;However, my perspective changed when I started working with Go. While developing in Kubernetes and working on microservices, I faced a situation where the complexity of the service made logging and tests insufficient. This was a large service with multiple dependencies, serving both users and operators, making bug fixes particularly challenging. &lt;/p&gt;

&lt;p&gt;Let me share a recent experience where using a debugger saved me a significant amount of time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Debugging in Kubernetes: A Real-World Example
&lt;/h3&gt;

&lt;p&gt;A few weeks ago, I was developing a feature that affected multiple parts of a microservice running in Kubernetes. This service was being called by several other components (like the app and admin panels), and I needed to trace what was happening inside the service, step by step. Tests couldn’t cover everything, especially when I needed to ensure that certain flags were set correctly to prevent sending multiple SMS notifications to users.&lt;/p&gt;

&lt;p&gt;I could have used logs to trace the service, but every time I missed an entry, I would have had to add a new one, push the code, and wait for the CI pipeline to complete—wasting 3-5 minutes each time just to deploy to staging.&lt;/p&gt;

&lt;p&gt;Instead, I decided to try using a debugger. Here’s how I set it up.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting Up Delve Debugger in a Kubernetes Pod
&lt;/h3&gt;

&lt;p&gt;After a lot of trial and error, I found that adding the following lines to the Dockerfile allowed me to run Delve (a debugger for Go) inside the Kubernetes Pod:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["/bin/sh", "-c", "/go/bin/dlv --listen=127.0.0.1:8001 --headless=true --api-version=2 --only-same-user=false exec /path/to/exec"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To install Delve, I added this line to the Dockerfile:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;RUN &lt;/span&gt;go &lt;span class="nb"&gt;install &lt;/span&gt;github.com/go-delve/delve/cmd/dlv@v1.23.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, I compiled the Go application with specific flags to disable optimizations and inlining:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go build &lt;span class="nt"&gt;-gcflags&lt;/span&gt; &lt;span class="s2"&gt;"all=-N -l"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Connecting to the Debugger
&lt;/h3&gt;

&lt;p&gt;Once the service was deployed, I used port forwarding to connect to the debugger. I added the following port configuration to the Kubernetes manifest:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dlv&lt;/span&gt;
    &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8081&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, I forwarded the port to my local machine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;oc port-forward &lt;span class="si"&gt;$(&lt;/span&gt;oc get pods &lt;span class="nt"&gt;-l&lt;/span&gt; &lt;span class="nv"&gt;app&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;LABEL &lt;span class="nt"&gt;-o&lt;/span&gt; name | &lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; 1&lt;span class="si"&gt;)&lt;/span&gt; 8001:8001
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now I could connect to the debugger from my local machine using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dlv connect :8001
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Setting Breakpoints and Configuring Paths
&lt;/h3&gt;

&lt;p&gt;With the debugger connected, I set breakpoints, such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;(&lt;/span&gt;dlv&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="nb"&gt;break &lt;/span&gt;main.main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because the service was built in a different environment (GitLab’s pipeline), the paths didn’t match my local setup. To solve this, I used &lt;code&gt;substitute-path&lt;/code&gt; to map the paths:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;(&lt;/span&gt;dlv&lt;span class="o"&gt;)&lt;/span&gt; config substitute-path /usr/local/go/ /Users/hossein/sdk/go1.23.1/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running these commands every time was tedious, so I created a script (&lt;code&gt;dlv.init&lt;/code&gt;) to automate the setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dlv connect :8001 &lt;span class="nt"&gt;--init&lt;/span&gt; dlv.init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Explanation of Key Commands
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Delve Debugger Command:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dlv &lt;span class="nt"&gt;--listen&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;127.0.0.1:8001 &lt;span class="nt"&gt;--headless&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="nt"&gt;--api-version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;2 &lt;span class="nt"&gt;--only-same-user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;false exec&lt;/span&gt; /path/to/exec
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;--listen&lt;/code&gt;: Opens the debugger on &lt;code&gt;127.0.0.1:8001&lt;/code&gt; for remote connections.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--headless=true&lt;/code&gt;: Runs Delve without an interactive UI, suitable for remote debugging.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--api-version=2&lt;/code&gt;: Uses API version 2 for better tool compatibility.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--only-same-user=false&lt;/code&gt;: Allows users other than the process owner to connect.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;exec /path/to/exec&lt;/code&gt;: Starts or attaches to the Go executable for debugging.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This setup allows remote debugging of a Go service in a Kubernetes environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Go Build Command:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go build &lt;span class="nt"&gt;-gcflags&lt;/span&gt; &lt;span class="s2"&gt;"all=-N -l"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;-N&lt;/code&gt;: Disables optimizations.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-l&lt;/code&gt;: Prevents inlining of functions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These flags ensure the code stays closer to the source, making it easier to debug by allowing you to step through every function.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example Usage
&lt;/h3&gt;

&lt;p&gt;To demonstrate how to debug a remote service, I use a local Docker container for simplicity. However, in practice, this service would be deployed on Kubernetes.&lt;/p&gt;

&lt;p&gt;To build the Docker image, run the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;make build
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command will build a Docker image. Once built, you can start the service by running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;make start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;make start
docker run &lt;span class="nt"&gt;-it&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; 8001:8001 &lt;span class="nt"&gt;-p&lt;/span&gt; 8080:8080 &lt;span class="nt"&gt;--rm&lt;/span&gt; my-app
API server listening at: &lt;span class="o"&gt;[&lt;/span&gt;::]:8001
2024-10-12T13:48:31Z warning &lt;span class="nv"&gt;layer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;rpc Listening &lt;span class="k"&gt;for &lt;/span&gt;remote connections &lt;span class="o"&gt;(&lt;/span&gt;connections are not authenticated nor encrypted&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The prompt shows that the debugger is running. To attach the debugger to the remote service, run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;make debug
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;make debug
bash &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"dlv connect :8001 --init &amp;lt;(sed 's|PWD|'&lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="s2"&gt;'|g; s|HOME|'&lt;/span&gt;&lt;span class="nv"&gt;$HOME&lt;/span&gt;&lt;span class="s2"&gt;'|g' dlv.init)"&lt;/span&gt;
Type &lt;span class="s1"&gt;'help'&lt;/span&gt; &lt;span class="k"&gt;for &lt;/span&gt;list of commands.
Breakpoint 1 &lt;span class="nb"&gt;set &lt;/span&gt;at 0x2188cc &lt;span class="k"&gt;for &lt;/span&gt;main.main.func1&lt;span class="o"&gt;()&lt;/span&gt; ./main.go:9
&lt;span class="o"&gt;(&lt;/span&gt;dlv&lt;span class="o"&gt;)&lt;/span&gt; c
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once attached, send the &lt;code&gt;c&lt;/code&gt; command to continue execution. Then, send a request to the service using the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;make send-request
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The debugger will show output like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;Breakpoint 1] main.main.func1&lt;span class="o"&gt;()&lt;/span&gt; ./main.go:9 &lt;span class="o"&gt;(&lt;/span&gt;hits goroutine&lt;span class="o"&gt;(&lt;/span&gt;17&lt;span class="o"&gt;)&lt;/span&gt;:1 total:1&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;PC: 0x2188cc&lt;span class="o"&gt;)&lt;/span&gt;
Warning: debugging optimized &lt;span class="k"&gt;function
&lt;/span&gt;Warning: listing may not match stale executable
     4:     &lt;span class="s2"&gt;"fmt"&lt;/span&gt;
     5:     &lt;span class="s2"&gt;"net/http"&lt;/span&gt;
     6: &lt;span class="o"&gt;)&lt;/span&gt;
     7:
     8: func main&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;   9:     http.HandleFunc&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"/"&lt;/span&gt;, func&lt;span class="o"&gt;(&lt;/span&gt;w http.ResponseWriter, r &lt;span class="k"&gt;*&lt;/span&gt;http.Request&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    10:         fmt.Fprintf&lt;span class="o"&gt;(&lt;/span&gt;w, &lt;span class="s2"&gt;"Hello, World!"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    11:     &lt;span class="o"&gt;})&lt;/span&gt;
    12:
    13:     http.ListenAndServe&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;":8080"&lt;/span&gt;, nil&lt;span class="o"&gt;)&lt;/span&gt;
    14: &lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point, you can manually debug the service. For example, to inspect the HTTP method, use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;(&lt;/span&gt;dlv&lt;span class="o"&gt;)&lt;/span&gt; p r.Method
&lt;span class="s2"&gt;"GET"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once troubleshooting is complete, use the &lt;code&gt;c&lt;/code&gt; command to continue and the client will receive the output.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Using a debugger in this case allowed me to step through the code and understand the service's behavior without constantly pushing new logging code. It was a game-changer, especially when working with complex microservices in Kubernetes. Debugging Go in production environments can be daunting, but with the right setup, it can save you a lot of time and frustration.&lt;/p&gt;

&lt;p&gt;You can check out the full example of this setup on &lt;a href="https://github.com/empire/go-dlv-k8s" rel="noopener noreferrer"&gt;my GitHub&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>go</category>
      <category>dlv</category>
      <category>microservices</category>
    </item>
  </channel>
</rss>
