<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Thomas Strömberg</title>
    <description>The latest articles on DEV Community by Thomas Strömberg (@tstromberg).</description>
    <link>https://dev.to/tstromberg</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F533806%2F1ae53723-dd46-4f83-97f0-eaa9c3379e9a.jpg</url>
      <title>DEV Community: Thomas Strömberg</title>
      <link>https://dev.to/tstromberg</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tstromberg"/>
    <language>en</language>
    <item>
      <title>The anatomy of a great playbook entry</title>
      <dc:creator>Thomas Strömberg</dc:creator>
      <pubDate>Fri, 21 May 2021 00:07:50 +0000</pubDate>
      <link>https://dev.to/tstromberg/the-anatomy-of-a-great-playbook-entry-35od</link>
      <guid>https://dev.to/tstromberg/the-anatomy-of-a-great-playbook-entry-35od</guid>
      <description>&lt;p&gt;What if you could easily reduce the length of outages by 3X?&lt;/p&gt;

&lt;p&gt;According to the &lt;a href="https://sre.google/sre-book/introduction/"&gt;SRE book&lt;/a&gt;, "recording the best practices ahead of time in a playbook produces roughly a 3x improvement in MTTR".  This improvement mirrors my experience with well-written playbooks. &lt;/p&gt;

&lt;p&gt;So what makes a playbook entry "great"?&lt;/p&gt;

&lt;h1&gt;
  
  
  Philosophy
&lt;/h1&gt;

&lt;p&gt;Remember how you felt in your first on-call rotation, when you were paged at 3am for a system you barely understood? &lt;strong&gt;Write your playbook entries for that person.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Playbooks should provide just enough context to confidently work through an incident, without providing extraneous content that will be a burden to keep up-to-date.&lt;/p&gt;

&lt;p&gt;Be wary of playbooks that offer exact remediation steps: these are often a sign of sacrificing human blood to a system that should be automated.&lt;/p&gt;

&lt;h1&gt;
  
  
  Discovery
&lt;/h1&gt;

&lt;p&gt;Alerts should always include the relevant playbook URL. Otherwise, you will introduce human error by introducing the possibility of the responder following the incorrect playbook.&lt;/p&gt;

&lt;p&gt;Consider including the alert name in the playbook URL to make it easier to find. This also the alert template to be templatized in some systems. For example: &lt;code&gt;https://playbooks/%%ALERT_NAME%%&lt;/code&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Structure
&lt;/h1&gt;

&lt;p&gt;Playbooks are the easiest to scan through in an emergency when they have a consistent structure. The exact best structure may differ depending on the organization, but this is what has worked for me:&lt;/p&gt;

&lt;p&gt;The structure that works best is highly dependent on your team's culture, but this is what has worked for me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Severity&lt;/strong&gt;: How to assess the criticality of this alert from your team's point of view. Is it a slow-burning issue that generates tickets, a critical paging event, or does the severity depend on the duration?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Impact&lt;/strong&gt;: How are your customers impacted by this alert? Often a one-liner, for example: "None immediately. If ignored, may result in revenue-impacting customer provisioning failures due to resource exhaustion"&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Metrics&lt;/strong&gt;: 1-2 graphs showing the impact, duration, and if the effect is worsening. Inline live-updating graphs work best, as they can prevent the incident responder from making unnecessary changes when the problem is dissipating. Hyperlinks are nearly as good.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Background&lt;/strong&gt;: What should a new person on the on-call rotation know about this system? Be terse, providing a hyperlink for more information and/or an architectural diagram. To reduce maintenance burden and cognitive load during incident response, share this section between multiple playbook entries via templating.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mitigation&lt;/strong&gt;: What are the recommended steps to mitigate the issue? This is often in checklist-style and may include steps for rolling back or redirecting traffic.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Debugging&lt;/strong&gt;: How should one get started digging into why this alert is firing? For example:&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; 1. Check for recent fatal error messages:
 2. Check the cluster for free disk space:
 3. Check &amp;lt;url&amp;gt; to see when the last release went out
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;References&lt;/strong&gt;: Links to the alert configuration, or code that generates the metric used by the alert, can be useful in understanding the underlying behavior. Post-mortems can also be valuable.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Formatting
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Be concise&lt;/li&gt;
&lt;li&gt;Bulleted or numbered lists instead of paragraphs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;a href="https://kubernetes.io/docs/contribute/style/style-guide/"&gt;Kubernetes Documentation Style Guide&lt;/a&gt;  has great recommendations for technical documentation, but the most important for playbooks is: &lt;strong&gt;make your commands trivial to copy and paste.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Do not include the command prompt. S

&lt;ul&gt;
&lt;li&gt;See: &lt;a href="https://tanelpoder.com/posts/how-to-stay-safe-in-shell/"&gt;data loss due to &amp;gt; character in prompt&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;Separate commands from example output&lt;/li&gt;
&lt;li&gt;Do not include real but unrelated host, site, or cluster names in your example command.

&lt;ul&gt;
&lt;li&gt;I once saw an outage spread when a responder copied an example command with the intent to edit the hostnames before pressing enter. They pressed enter first.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Maintenance
&lt;/h1&gt;

&lt;p&gt;Keep playbooks up to date by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Regularly scheduled &lt;a href="%5Bhttps://sre.google/sre-book/accelerating-sre-on-call/"&gt;"Wheel of Misfortune" role-playing game sessions&lt;/a&gt;, where the previous on-call engineer walks the current on-call engineer through a pager response scenario. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Post-mortem action items that suggest playbook updates to decrease the resolution time for future pages for the same alert.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Big-bang efforts such as auditing all of the playbooks for relevance are best made once initially, to get the playbooks into the same structure. I have never seen quarterly playbook reviews work.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Special thanks to &lt;a href="https://josebiro.medium.com/"&gt;Joseph Bironas&lt;/a&gt; for editorial feedback and ideas for this article.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>sre</category>
      <category>devops</category>
      <category>monitoring</category>
    </item>
    <item>
      <title>Motivating Software Engineering Teams</title>
      <dc:creator>Thomas Strömberg</dc:creator>
      <pubDate>Tue, 09 Mar 2021 22:52:54 +0000</pubDate>
      <link>https://dev.to/tstromberg/motivating-software-engineering-teams-n1n</link>
      <guid>https://dev.to/tstromberg/motivating-software-engineering-teams-n1n</guid>
      <description>&lt;p&gt;&lt;strong&gt;Empathy, purpose, craftsmanship.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Empathy
&lt;/h2&gt;

&lt;p&gt;The key to motivating a team is to identify what motivates the people that make up the team, with enough empathy to put yourself in their shoes.&lt;/p&gt;

&lt;p&gt;Everyone wants to be happy, but everyone has their unique path to happiness. Learning the career and life goals of everyone on the team allows you to prime the right tasks for them at the right time. &lt;/p&gt;

&lt;p&gt;Asking people directly, "What motivates you as a software engineer?" will often unlock the right set of hints for how to frame messages in a way that works for them. However, to get an honest and complete answer, one needs to build rapport.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Building rapport&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Building rapport with the team is critical to establishing the psychological safety required for folks to feel comfortable sharing their real thoughts. My advice is to model and build a real personal connection that supersedes the synthetic relationship of the manager to the direct report.&lt;/p&gt;

&lt;p&gt;My strategy for building connection is: demonstrating care, showing vulnerability, and deep listening. Proving that you care is something that you cannot fake. If you cannot care deeply for each person on your team, you will fail to motivate your team in the long term. &lt;/p&gt;

&lt;p&gt;A technique I have used successfully is to declare to each person my focus: it's them, and their long-term career. As a manager, the people you inspire are the legacy you will leave behind. This advice may seem antithetical to most business guidance, but if you genuinely care for your team, you will be in a better place to inspire them. After all, as humans, we are more important than the companies in which we serve.&lt;/p&gt;

&lt;p&gt;It is important to recognize that the average tenure at a tech company is three years, so your manager/report relationship will last on average only a year and a half, or about 5% of their career. As a manager trying to build a high-functioning team, you should focus on making the most of this overlap to set them up for the other 95% of their career. Letting your direct report know that you are on their side and in it for the long-haul will build the rapport necessary for candor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Life stories make honest goals approachable&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To build mutual respect, candor, and empathy, I like to take time in an early 1:1 to model it through sharing my life story, career journey, goals, and missteps. Next, it's your turn to listen intently to your report's own life story, noting that they may not be ready to share all of it. What you hear though, may surprise and shock you. Life isn't always sunshine and roses.&lt;/p&gt;

&lt;p&gt;For subsequent 1:1's, I ask for the report to record their goals, both within the company and outside of it. These goals will be kept at the top of our 1:1 notes to always stay fresh within our minds when we meet. Mine reads:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Career: make large-scale computing trivial for the world
to use in a sustainable manner.

Non-career: Accelerate human knowledge. Be a great father
and spouse.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With time, you should achieve the empathy and respect necessary to understand each other's personal and professional motivation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Purpose
&lt;/h2&gt;

&lt;p&gt;Many define successful management as the ability to get the most out of the team. Now that you have learned about each individual's motivation, it is time to find the common thread that binds the team together. This shared context will help you build a plan to harmoniously fit everyone's local maximum (personal motivation) into the global maximum (personal+team+company motivation). &lt;/p&gt;

&lt;p&gt;To discover this thread, I recommend first sharing the mission and value statements of other teams, and then letting your team discover define their own statement of values. These values should be ideals or traits that your team cannot live without. Once defined, your team is ready to create a mission statement. A mission statement should:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Align with the mission of the group above, if applicable&lt;/li&gt;
&lt;li&gt;Clarify who your team serves&lt;/li&gt;
&lt;li&gt;Be worded precisely enough to not apply to other teams&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Depending on the team size, expect that this may take three different 45-minute meetings to brainstorm and refine it. Your team may resist the idea of brainstorming a mission at first, but once they learn what everyone agrees is essential, they will appreciate the clarity.&lt;/p&gt;

&lt;p&gt;For suggestions on how to run a successful exploration of mission statement &amp;amp; values, I highly recommend reading &lt;em&gt;Tribal Leadership: Leveraging Natural Groups to Build a Thriving Organization&lt;/em&gt; by Logan, King, Fischer-Wright.&lt;/p&gt;

&lt;h2&gt;
  
  
  Craftsmanship
&lt;/h2&gt;

&lt;p&gt;Software engineers at their heart are craftspeople. As with any other craft trade, software engineers intrinsically want to be proud of what they create. Craftsmanship is an important trait to cultivate as it paves the way for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High morale&lt;/li&gt;
&lt;li&gt;Product excellence&lt;/li&gt;
&lt;li&gt;Prevention of technical debt&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even when one is racing the revenue clock, deprioritizing craftsmanship is never the answer. If you push engineers toward deadline-driven development, you may very well be causing them to lower the code quality bar. If not addressed immediately, the decreased code quality generates a vicious circle of low morale, low velocity, high turnover, and an ever-increasing pile of technical debt.&lt;/p&gt;

&lt;p&gt;If you encourage your team to build artifacts (code, design docs) that they can be proud of throughout their career, you will no longer have to worry much about motivation.&lt;/p&gt;

&lt;p&gt;For more on craftsmanship in software, I recommend &lt;em&gt;Software Craftsman, The: Professionalism, Pragmatism, Pride&lt;/em&gt; by Sandro Mancuso.&lt;/p&gt;

</description>
      <category>management</category>
      <category>empathy</category>
      <category>purpose</category>
      <category>craftsmanship</category>
    </item>
    <item>
      <title>Tesla Model Y: family camping in below-zero temperatures</title>
      <dc:creator>Thomas Strömberg</dc:creator>
      <pubDate>Mon, 08 Mar 2021 04:42:07 +0000</pubDate>
      <link>https://dev.to/tstromberg/tesla-model-y-family-camping-in-below-zero-temperatures-od3</link>
      <guid>https://dev.to/tstromberg/tesla-model-y-family-camping-in-below-zero-temperatures-od3</guid>
      <description>&lt;p&gt;TL;DR: If you think outside of the box, you can sleep 4 with full climate control for a cost of 10-12% battery usage per night, even in freezing temperatures.&lt;/p&gt;

&lt;p&gt;When we bought our Tesla Model Y in December, the intent was to roughly recreate the experiences of our trusty 1971 VW Bus, but in electric form.&lt;/p&gt;

&lt;p&gt;With all the cargo space, and the availability of great mattress options, such as the Tesmat, one can get pretty close to a passenger-van conversion. What one cannot simulate easily however, is the 4-person sleeping experience one can get from a  pop-up tent. The Tesla Model Y can only comfortably fit 2 adults, or 1 adult &amp;amp; two children. &lt;/p&gt;

&lt;p&gt;The best option I've found for sleeping 4 is the &lt;a href="https://www.napieroutdoors.com/shop/suv-minivan-tents/sportz-suv-tent-model-82000/." rel="noopener noreferrer"&gt;Napier SUV Sportz Tent&lt;/a&gt; This gargantuan tent attaches easily to the back of a Tesla Model Y, but it is a bit heavy and bulky to pack up. The biggest plus side to this option is that you can still run the Camp Mode on the Tesla Model Y to heat the tent: which is something we tested this weekend. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqpg6gekla17hhzskhr3u.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqpg6gekla17hhzskhr3u.jpg" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Using Camp Mode to climate control the Napier tent is a party trick that is mostly only useful in extreme temperatures. In my experience, Camp Mode on the Tesla Model Y consumes ~10% battery life when set to 67'F with a 40'F exterior temperature, and the trunk closed.&lt;/p&gt;

&lt;p&gt;My primary concern was power consumption due to the lack of insulation: tents are typically nowhere near as insulated as a Tesla. How much power would we waste by adding the tent? &lt;/p&gt;

&lt;p&gt;To counter the insulation issue, I ordered 3mm 48"x50Ft, and a single roll of foil tape, to build a ~5'x5'x5' cube within the Napier SUV Tent, lovingly called the "space station":&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F58zugggwfacby4g8gfbq.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F58zugggwfacby4g8gfbq.jpg" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The foil tape ran out quite quickly, so I finished it up with some painters tape. &lt;/p&gt;

&lt;p&gt;On the first night of testing, we arrived at 10pm with 68% battery, raining cats &amp;amp; dogs, and a chilly 35'F. I was too busy trying to setup without getting soaked that I didn't bother sealing the gaps too much. &lt;/p&gt;

&lt;p&gt;I set the Tesla to Camp Mode @ 67'F, and used a screwdriver to flip the trunk latch to trick the car into thinking the trunk was shut. I had heard that otherwise, the climate control  would shut off after 30 minutes. Our battery went from 67% to 47%: a 20% drop over 10 hours, which had me a bit worried for the second night. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftkg9brxdv9id0uqr7bdc.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftkg9brxdv9id0uqr7bdc.jpg" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By this point, we had acquired 2 extra rolls of foil tape (still not enough), and used some duct-tape to seal the gaps between the car and the space station. I was more conservative this time, setting the car to 60'F, setting the vent manually to 1. After 7 hours, we consumed only 5% battery, so I increased the heat to 66'F for the next 4 hours, which consumed another 5%. Not bad when the exterior temperature was 30'F!  &lt;/p&gt;

&lt;p&gt;One recommendation is a stuff sack, as packing the inner tent can consume a lot of space in the car otherwise (a little bit bigger than the packed size of the Napier SUV tent). Here's a photo of the inner tent  before I rolled it up:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3gekpj3wb8ylwhof2r2g.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3gekpj3wb8ylwhof2r2g.jpg" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When we left the campsite, we were at 33% battery, which was just enough to get us to the supercharger in Ukiah. My backup plan was a slow charger just up the road in Lakeport, which I still tried for fun.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqowm3ocxsj7n0sotoqev.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqowm3ocxsj7n0sotoqev.jpg" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This experiment means that the insulated Napier tent is effectively as efficient insulation wise as the Tesla itself: at worst, 10-20% more power consumption over the regular Camp Mode with the trunk shut. Depending on your target temperature, plan on 10-12% battery consumption per night (11 hours) with Camp Mode in the Tesla Model Y.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0uf6zs2brsoui1ouavz8.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0uf6zs2brsoui1ouavz8.jpg" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This result means that the Tesla Y can sleep 4 people in freezing temperatures for multiple nights in a row without a source of electricity. I still have a few gaps to fill with foil tape, and need to bring painters tape on the next trip to quickly seal it against the car.&lt;/p&gt;

&lt;p&gt;Hope this post helps someone!&lt;/p&gt;

</description>
      <category>tesla</category>
    </item>
    <item>
      <title>Go &amp; secondary groups: a kaniko adventure!</title>
      <dc:creator>Thomas Strömberg</dc:creator>
      <pubDate>Fri, 26 Feb 2021 02:21:37 +0000</pubDate>
      <link>https://dev.to/tstromberg/go-secondary-groups-a-kaniko-adventure-6mn</link>
      <guid>https://dev.to/tstromberg/go-secondary-groups-a-kaniko-adventure-6mn</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally posted on my personal blog in April 2020&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I wanted to get my feet wet with understanding &lt;a href="https://github.com/GoogleContainerTools/kaniko"&gt;Kaniko&lt;/a&gt;,&lt;br&gt;
an open-source in-cluster builder for Docker images. I happen to work with one of the maintainers, Tejal,&lt;br&gt;
and I asked her if there was any interesting UNIX-internals sort of bugs that might be interesting.&lt;/p&gt;

&lt;p&gt;Here's the &lt;a href="https://github.com/GoogleContainerTools/kaniko/issues/1097"&gt;mystery issue&lt;/a&gt;: "The USER command does not set the correct gids, so extra groups are dropped". Here's an example to reproduce it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FROM ubuntu:latest
RUN groupadd -g 20000 bar
RUN groupadd -g 10000 foo
RUN useradd -c "Foo user" -u 10000 -g 10000 -G bar -m foo
RUN id foo
USER foo
RUN id
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In an ideal world, both "id" commands should give the same output, but the second one did not include &lt;code&gt;foo&lt;/code&gt;'s membership in &lt;code&gt;bar&lt;/code&gt;. This definitely sounded&lt;br&gt;
like a secondary group issue. I happened to know that secondary groups were bolted onto the UNIX implementation some 10 years later than primary groups (SVR4, by way of BSD). &lt;/p&gt;
&lt;h2&gt;
  
  
  How to reproduce
&lt;/h2&gt;

&lt;p&gt;First, get a shell into the Kaniko debug image, mounting in the out/ and integration/ subdirectory:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;docker run -it --entrypoint /busybox/sh -v "$HOME"/.config/gcloud:/root/.config/gcloud -v (pwd)/integration:/workspace -v (pwd)/out:/out gcr.io/kaniko-project/executor:debug&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;I placed their Dockerfile into &lt;code&gt;kaniko/integration/1097&lt;/code&gt;, which was mounted as &lt;code&gt;/workspace&lt;/code&gt;. I could then trivially reproduce their case using kaniko:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/kaniko/executor -f 1097 --context=dir://workspace --destination=gcr.io/kaniko/test --tarPath=/tmp/image.tar --no-push&lt;/code&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Finding the culprit
&lt;/h2&gt;

&lt;p&gt;The first question was: how does Kaniko implement user switching? Are they switching in such a way that populates secondary groups? I ask because the standard syscalls (&lt;code&gt;seteuid&lt;/code&gt;, &lt;code&gt;setegid&lt;/code&gt;) do not implement secondary groups: one has to instead call &lt;a href="https://linux.die.net/man/2/setgroups"&gt;&lt;code&gt;setgroups&lt;/code&gt;&lt;/a&gt;. Here's what I found:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SysProcAttr&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Credential&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;syscall&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Credential&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Uid&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;uid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Gid&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;gid&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;[SysProcAttr](https://golang.org/pkg/syscall/#SysProcAttr)&lt;/code&gt; is not exactly a well-known feature in Go, but it's perfect for setting exec attributes such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Chroot&lt;/code&gt; - lock the process into a directory&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Pdeathsig&lt;/code&gt; -  Signal that the process will get when its parent dies (Linux only)&lt;/li&gt;
&lt;li&gt;... and many options for user namespacing: handy for container tools.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So, I figured it would be easy enough to improve the function in such a way that performs secondary group impersonation. The trick to you, dear reader, is to find the flaw!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;impersonate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userStr&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;syscall&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Credential&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
   &lt;span class="o"&gt;...&lt;/span&gt;
   &lt;span class="n"&gt;groups&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;uint32&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;
   &lt;span class="n"&gt;gidStr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GroupIds&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
   &lt;span class="n"&gt;logrus&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Infof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"groupstr: %s"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gidStr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

   &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;g&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;gidStr&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
       &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;strconv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ParseUint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
           &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Wrap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"parseuint"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="p"&gt;}&lt;/span&gt;
       &lt;span class="n"&gt;groups&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;groups&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;uint32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
   &lt;span class="p"&gt;}&lt;/span&gt;

   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;syscall&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Credential&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
       &lt;span class="n"&gt;Uid&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="n"&gt;uid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;Gid&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="n"&gt;gid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;Groups&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;groups&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After running &lt;code&gt;make&lt;/code&gt;, I hop back into the container to run the repro case, and I'm perplexed by the log message:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;INFO[0013] u.GroupIds returned: []&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Is kaniko running in some alternate chroot universe where it can't see? I double check by adding a shell command:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;out, err = exec.Command("grep", "foo", "/etc/group").Output()&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The answer is no. At this point, there are only two options in my mind. Either this is a Go bug, or, if Go is using libc to make this call (likely),&lt;br&gt;
it's a libc bug, or at least a disagreement between the two systems. As soon as you have made the decision to blame the compiler, it's time to gather evidence, typically by making a simpler test case. I opt first to investigate if Go is using libc to look up the list of secondary groups, starting with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://golang.org/src/os/user/listgroups_unix.go"&gt;os/user/listgroups_unix.go&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A couple of nested functions later, and you can see that it's calling:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;mygetgrouplist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gid_t&lt;/span&gt; &lt;span class="n"&gt;group&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gid_t&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;groups&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;ngroups&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;getgrouplist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;group&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;groups&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ngroups&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is almost the same implementation you see in busybox's &lt;code&gt;id&lt;/code&gt; command &lt;a href="https://github.com/brgl/busybox/blob/master/coreutils/id.c"&gt;source&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;get_groups&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gid_t&lt;/span&gt; &lt;span class="n"&gt;rgid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gid_t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;groups&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
   &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
     &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;getgrouplist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rgid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;groups&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
     &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
   &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, it's possible that Go is setting &lt;code&gt;ngroups&lt;/code&gt; to 0, so we just build a little test case program:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Lookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
     &lt;span class="nb"&gt;panic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"lookup failed: %v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The test program runs great on macOS, but when I use &lt;a href="https://github.com/karalabe/xgo"&gt;xgo&lt;/a&gt; to cross-compile it for Linux, all it outputs is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-rwxr-xr-x    1 0        0          2125099 Mar 28 20:26 ggroups-linux-amd64

# ./ggroups-linux-amd64
/busybox/sh: ./ggroups-linux-amd64: not found
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you ever see this error in UNIX, it usually means one of three things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The program specifies an invalid &lt;code&gt;#!&lt;/code&gt; line&lt;/li&gt;
&lt;li&gt;The binary needs a shared library that does not exist&lt;/li&gt;
&lt;li&gt;The binary is for the wrong architecture&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this case, I suspected #2, because I see that busybox is in use, chances are pretty high that this Docker image lacks libc. This environment&lt;br&gt;
does not have &lt;code&gt;ldd&lt;/code&gt;, but it has &lt;code&gt;strings&lt;/code&gt;, so I can get some hints about the binary that was built:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;strings /out/ggroups-linux-amd64  | head
bhkFaBAPgWy3KAp2RQcd/llKGprZSMM7cCxIzwmJ9/0QgnPM9q9pk--9IIyIXn/X9bTurj9MBmKtnVL-ANT
/lib64/ld-linux-x86-64.so.2
ATUSH
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It looks like the right architecture, but yeah, that library doesn't exist. Just to confirm my sanity, I confirmed this program works great in an ubuntu container. I immediately suspect that either kaniko's user environment is trash, or kaniko is up to shenanigans in their &lt;code&gt;Makefile&lt;/code&gt;. The easier is easier to check, and it doesn't take long to notice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight make"&gt;&lt;code&gt;&lt;span class="nl"&gt;out/executor&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;$(GO_FILES)&lt;/span&gt;
    &lt;span class="nv"&gt;GOARCH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$(GOARCH)&lt;/span&gt; &lt;span class="nv"&gt;GOOS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;linux &lt;span class="nv"&gt;CGO_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0 go build &lt;span class="nt"&gt;-ldflags&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
       &lt;span class="nv"&gt;$(GO_LDFLAGS)&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;$@&lt;/span&gt; &lt;span class="nv"&gt;$(EXECUTOR_PACKAGE)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;God damnit. kaniko works because they disable &lt;code&gt;cgo&lt;/code&gt; to workaround the lack of a libc environment. Look back at &lt;a href="https://golang.org/src/os/user/listgroups_unix.go"&gt;listgroups_unix.go&lt;/a&gt; - it uses C code, and the build rule specifically states only to build with cgo. If we look at the fallback implementation, we see:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;func listGroups(*User) ([]string, error) {
    if runtime.GOOS == "android" || runtime.GOOS == "aix" {
        return nil, fmt.Errorf("user: GroupIds not implemented on %s", runtime.GOOS)
    }
    return nil, errors.New("user: GroupIds requires cgo")
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But wait - we didn't see an error in our impersonate function! I try to compile it without cgo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;env CGO_ENABLED=0 go run ggroups.go root
panic: groupids failed: user: GroupIds requires cgo

goroutine 1 [running]:
main.main()
    /Users/tstromberg/src/ggroups/ggroups.go:18 +0x117
exit status 2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The mystery deepens
&lt;/h2&gt;

&lt;p&gt;If you see an error in one environment, and not the other, chances are either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A compiler error&lt;/li&gt;
&lt;li&gt;A kernel error&lt;/li&gt;
&lt;li&gt;You forgot to check the error code.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's almost always the last option. Sure enough:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;   &lt;span class="n"&gt;gidStr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GroupIds&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
   &lt;span class="n"&gt;logrus&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Infof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"groupstr: %s"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gidStr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As soon as I noticed this, I walked away from my computer for an hour. I suggest you do the same.&lt;/p&gt;

</description>
      <category>kaniko</category>
      <category>go</category>
    </item>
    <item>
      <title>Persistent multi-user Docker on macOS</title>
      <dc:creator>Thomas Strömberg</dc:creator>
      <pubDate>Mon, 01 Feb 2021 20:04:10 +0000</pubDate>
      <link>https://dev.to/tstromberg/persistent-multi-user-docker-on-macos-32em</link>
      <guid>https://dev.to/tstromberg/persistent-multi-user-docker-on-macos-32em</guid>
      <description>&lt;p&gt;First, be aware that &lt;code&gt;docker&lt;/code&gt; is not designed to be securely shared among multiple users. Please assume that anyone who has access to &lt;code&gt;docker&lt;/code&gt; is effectively equivalent to `root'.&lt;/p&gt;

&lt;p&gt;This assumes that users will be interacting with &lt;code&gt;docker&lt;/code&gt; via the command-line, rather than graphically. It also assumes that the environment is such that allows a single user to be automatically logged into via the GUI, but this is mostly out of laziness rather than an underlying technical restriction.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Choose an account that Docker Desktop will run as. I recommend creating a &lt;code&gt;docker&lt;/code&gt; user, but it could be any account. This account does not need admin access.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Open &lt;code&gt;Settings -&amp;gt; Users &amp;amp; Groups -&amp;gt; Login Options&lt;/code&gt;, and ensure that this user is automatically logged into.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Created a shared containers directory:&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;sudo mkdir -p /Users/Shared/Library/Containers&lt;br&gt;
sudo chown docker:staff /Users/Shared/Library/Containers&lt;br&gt;
sudo chmod -R 770 /Users/Shared/Library/Containers/&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Login graphically with the account that will run Docker and start &lt;code&gt;/Applications/Docker.app&lt;/code&gt;, answer any questions it might have.&lt;/li&gt;
&lt;li&gt;Open &lt;code&gt;Settings -&amp;gt; Users &amp;amp; Groups -&amp;gt; Login Items&lt;/code&gt;, and drag the &lt;code&gt;Docker&lt;/code&gt; app to it.&lt;/li&gt;
&lt;li&gt;Quit &lt;code&gt;Docker Desktop&lt;/code&gt; via the menu item&lt;/li&gt;
&lt;li&gt;Open &lt;code&gt;Terminal&lt;/code&gt; and move your Docker data to a shared location that can be written to by other users:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;mv ~/Library/Containers/com.docker.docker /Users/Shared/Library/Containers&lt;br&gt;
chmod -R 770 /Users/Shared/Library/Containers/com.docker.docker&lt;br&gt;
chmod -R +a "group:staff allow list,add_file,search,add_subdirectory,delete_child,readattr,writeattr,readextattr,writeextattr,readsecurity,file_inherit,directory_inherit" /Users/Shared/Library/Containers/com.docker.docker&lt;br&gt;
chmod -R g+rw /Users/Shared/Library/Containers/com.docker.docker/Data&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Then link your local Docker data to this shared source, and make sure that others can traverse into this folder to resolve the socket symlink:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;ln -s /Users/Shared/Library/Containers/com.docker.docker ~/Library/Containers/com.docker.docker&lt;br&gt;
chmod g+x ~/Library ~/Library/Containers&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Restart &lt;code&gt;/Applications/Docker.app&lt;/code&gt; to test&lt;/li&gt;
&lt;li&gt;SSH into the host as another username, and run &lt;code&gt;docker run mariadb&lt;/code&gt; to test.&lt;/li&gt;
&lt;li&gt;Reboot host and reconnect via ssh to test (it may take a moment for Docker to start up)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is the configuration we use for the #kernelcafe. Please add your improvements to the comments!&lt;/p&gt;

</description>
      <category>docker</category>
      <category>macos</category>
    </item>
    <item>
      <title>kernel café, toward alpha 1</title>
      <dc:creator>Thomas Strömberg</dc:creator>
      <pubDate>Fri, 22 Jan 2021 05:22:32 +0000</pubDate>
      <link>https://dev.to/tstromberg/kernel-cafe-toward-alpha-1-28m6</link>
      <guid>https://dev.to/tstromberg/kernel-cafe-toward-alpha-1-28m6</guid>
      <description>&lt;p&gt;I spent some time today in the data-hall, getting lighting setup, as well as a console. This weekend I aim to get the rack-boards installed on the walls.&lt;/p&gt;

&lt;p&gt;Here are the alpha milestones I aim to reach before the service becomes initially shareable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Week 1&lt;/strong&gt;: Initial network setup, first node with SSH cert sync &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Week 2&lt;/strong&gt;: Tinkerbell setup and second node: Honeycomb LX2, installed via Tinkerbell. Trusted testers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Week 3&lt;/strong&gt;: Third node: Mac Mini&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Week 4&lt;/strong&gt;: Nodes 4 &amp;amp; 5, Public Kubernetes Cluster&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Week 5&lt;/strong&gt;: Segregate Firewall, resource controls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Beta!&lt;/p&gt;

&lt;p&gt;For the SSH cert synchronization setup, I'm considering basing it on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/kubernetes/test-infra/tree/master/prow/cmd/peribolos"&gt;periblos&lt;/a&gt; - YAML to GitHub Org sync&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/samber/sync-ssh-keys"&gt;sync-ssh-keys&lt;/a&gt; - GitHub Org to SSH keys&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The missing component though is GitHub Org to UNIX users and groups, which should be easy enough to solve.&lt;/p&gt;

</description>
      <category>operations</category>
    </item>
    <item>
      <title>Supply vs Demand</title>
      <dc:creator>Thomas Strömberg</dc:creator>
      <pubDate>Wed, 20 Jan 2021 01:38:36 +0000</pubDate>
      <link>https://dev.to/tstromberg/supply-vs-demand-20e7</link>
      <guid>https://dev.to/tstromberg/supply-vs-demand-20e7</guid>
      <description>&lt;p&gt;When running public service as a free-time effort, it is  critical to consider supply &amp;amp; demand as early on in the process as feasible.&lt;/p&gt;

&lt;p&gt;For most folks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;'supply' is the resources you can sink into the service&lt;/li&gt;
&lt;li&gt;'demand' is what the public wants out of this service&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you structure your service in such a way that demand immediately outstrips supply, you may have to quickly pivot in a manner that is disruptive to your users.&lt;/p&gt;

&lt;p&gt;On the contrary, building a service that no one wants is a waste of your time entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  Addressing demand
&lt;/h2&gt;

&lt;p&gt;For the kernel café, I'll focus initially on solving the use cases where I've seen demonstrable demand by open-source developers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Access to mixed-architecture Kubernetes clusters&lt;/li&gt;
&lt;li&gt;Interactive access to arm64 (Linux, macOS)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To reduce demand, I am considering the following limitations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;IPv6-only&lt;/li&gt;
&lt;li&gt;Public data only&lt;/li&gt;
&lt;li&gt;&lt;p&gt;1-hour CPU limit*&lt;br&gt;
** Unprivileged access*&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Subject to contribution of time or money&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Addressing supply
&lt;/h2&gt;

&lt;p&gt;Maintaining open infrastructure is incredibly time consuming, and potentially expensive.&lt;/p&gt;

&lt;h3&gt;
  
  
  Time
&lt;/h3&gt;

&lt;p&gt;CNCF Projects, such as &lt;a href="https://dev.toTinkerbell"&gt;https://tinkerbell.org&lt;/a&gt;, make it substantially cheaper to work with physical infrastructure, by allowing it to be managed as cattle rather than pets. Treating the hosts as ephemeral, where they are reinstalled each reboot, goes a long way to addressing long-term maintainability of nodes. In situations where downtime is acceptable or encouraged, requiring that each node is rebooted on a schedule (weekly) also helps.&lt;/p&gt;

&lt;p&gt;=== Money&lt;/p&gt;

&lt;p&gt;Setting up a janky datacenter is not expensive, but does require capital outlay.&lt;/p&gt;

&lt;p&gt;With my limited experience, the more worrying expense is that of power and cooling. Thankfully, both issues can be addressed by speccing out equipment that consumes no more power than necessary.&lt;/p&gt;

&lt;p&gt;I intend to build the backbone of cluster with Raspberry Pi 4's, which seems to be unusually cost-effective, even if there are I/O performance limitations. They only consume ~4w at idle.&lt;/p&gt;

&lt;p&gt;I have 4x RockPro64's that I can throw into the mix, but they are a bit more exotic for a build-out, and are limited to 8GB. Similar power consumption properties. Interactive users will likely use a Honeycomb LX2 for now.&lt;/p&gt;

&lt;p&gt;For x86 support (a necessary evil for platforms like Fuschia), it is difficult to get similar power consumption numbers. Intel NUC's seem to enjoy the best balance of support and power consumption, even if AMD-based solutions trounce them in performance. Before I begin to acquire NUC's, I'll first need to collect data on consumption vs performance.&lt;/p&gt;

&lt;p&gt;To be continued ...&lt;/p&gt;

</description>
      <category>serverless</category>
    </item>
    <item>
      <title>Dreaming of a Kernel Cafe</title>
      <dc:creator>Thomas Strömberg</dc:creator>
      <pubDate>Wed, 20 Jan 2021 00:35:36 +0000</pubDate>
      <link>https://dev.to/tstromberg/dreaming-of-a-kernel-cafe-191f</link>
      <guid>https://dev.to/tstromberg/dreaming-of-a-kernel-cafe-191f</guid>
      <description>&lt;p&gt;Remember the 1990's? &lt;/p&gt;

&lt;p&gt;When folks blessed with bandwidth setup shell servers, and invited their friends to share in the bounty?&lt;/p&gt;

&lt;p&gt;I just moved into a new house, and discovered that I was in this scenario again, for the first time in 20 years.&lt;/p&gt;

&lt;p&gt;Today, open-source developers need bandwidth less than access to dev &amp;amp; test environments that differ from their own. What if I combined the two ideas together to create something new?&lt;/p&gt;

&lt;p&gt;Thus was born the idea of the Kernel Cafe: Public Cloud Native Infrastructure, run by volunteers.&lt;/p&gt;

&lt;p&gt;It's not yet clear to me whether this idea has legs, but you can follow along here to watch.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>unix</category>
    </item>
    <item>
      <title>Another puzzle to solve ...</title>
      <dc:creator>Thomas Strömberg</dc:creator>
      <pubDate>Sat, 05 Dec 2020 17:10:12 +0000</pubDate>
      <link>https://dev.to/tstromberg/another-puzzle-to-solve-3b9j</link>
      <guid>https://dev.to/tstromberg/another-puzzle-to-solve-3b9j</guid>
      <description>&lt;p&gt;I got into computing, as I admired how computers could connect people from a variety of backgrounds, and loved solving the puzzles it introduced.&lt;/p&gt;

&lt;p&gt;My latest hack: &lt;a href="http://github.com/tstromberg/campwiz"&gt;http://github.com/tstromberg/campwiz&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
