<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: John Mitchell</title>
    <description>The latest articles on DEV Community by John Mitchell (@johntellsall).</description>
    <link>https://dev.to/johntellsall</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F18696%2Fc339e37e-c594-41d2-b0a9-41dae98c0bda.jpeg</url>
      <title>DEV Community: John Mitchell</title>
      <link>https://dev.to/johntellsall</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/johntellsall"/>
    <language>en</language>
    <item>
      <title>Best for programming / Python</title>
      <dc:creator>John Mitchell</dc:creator>
      <pubDate>Sat, 30 May 2026 14:54:23 +0000</pubDate>
      <link>https://dev.to/johntellsall/best-for-programming-python-pem</link>
      <guid>https://dev.to/johntellsall/best-for-programming-python-pem</guid>
      <description>&lt;p&gt;When I volunteered at the Python booth at our local tech conference, the #1 question was "how do I get better at programming?"&lt;/p&gt;

&lt;p&gt;Fortunately there is an easy answer:&lt;/p&gt;

&lt;h2&gt;
  
  
  Automate the Boring Stuff
&lt;/h2&gt;

&lt;p&gt;By Al Sweigart&lt;/p&gt;

&lt;p&gt;Instead of teaching "programming" or "Python", he focuses on actually &lt;em&gt;doing&lt;/em&gt; things. First you learn some of the basics to get started, then you actually write real programs.&lt;/p&gt;

&lt;p&gt;By narrowly focusing on about 30% of the language, you learn &lt;em&gt;only&lt;/em&gt; what's really needed for everyday work. He's not teaching the next generation of "professional programmers", he's focusing on everyday people doing everyday tasks.&lt;/p&gt;

&lt;p&gt;Best of all -- it's &lt;strong&gt;free&lt;/strong&gt;. The author has very generously spent years of work, and chose to give it away!&lt;/p&gt;

&lt;p&gt;There's even a workbook, to help practice and lock in the skills. It's also free! There's a video course! The first 15 episodes are free.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://automatetheboringstuff.com/" rel="noopener noreferrer"&gt;https://automatetheboringstuff.com/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>automation</category>
      <category>beginners</category>
      <category>python</category>
      <category>resources</category>
    </item>
    <item>
      <title>Secret way to learning a LOT</title>
      <dc:creator>John Mitchell</dc:creator>
      <pubDate>Sat, 30 May 2026 14:41:56 +0000</pubDate>
      <link>https://dev.to/johntellsall/secret-way-to-learning-a-lot-52p5</link>
      <guid>https://dev.to/johntellsall/secret-way-to-learning-a-lot-52p5</guid>
      <description>&lt;p&gt;I adore programming books... for a specific use case. If I want to learn a lot about a subject, nothing is better than a book:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;it was edited, so ideas and text are consistent&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the examples worked (at some point in time)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the author has enough time-space to clearly describe a lot of material well&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;they stay relevant for a LONG time, so there's a lot of them, and they can be very inexpensive. I use &lt;a href="https://abebooks.com/" rel="noopener noreferrer"&gt;https://abebooks.com/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Nothing beats a book for an enormous amount of detailed information. Even just reading the table of contents helps to help understand what to focus on during study.&lt;/p&gt;

&lt;p&gt;My copy of "DevOps Handbook" is covered in tags and handwritten notes. Brett Slatkin's "Effective Python" is way too much for me, but I got a ton out of the parts I focused on.&lt;/p&gt;

&lt;h3&gt;
  
  
  No other media is like books!
&lt;/h3&gt;

&lt;p&gt;Reposted from &lt;a href="https://news.ycombinator.com/item?id=48273030" rel="noopener noreferrer"&gt;https://news.ycombinator.com/item?id=48273030&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Deploys should be obvious</title>
      <dc:creator>John Mitchell</dc:creator>
      <pubDate>Fri, 16 Jan 2026 05:05:15 +0000</pubDate>
      <link>https://dev.to/johntellsall/deploys-should-be-obvious-hdo</link>
      <guid>https://dev.to/johntellsall/deploys-should-be-obvious-hdo</guid>
      <description>&lt;p&gt;I &lt;em&gt;always&lt;/em&gt; use a timestamp as a postfix. Makes issues super easy to track.&lt;/p&gt;

&lt;p&gt;One time I did a deploy, and checked the service, and the deploy worked!... once. Then I reloaded the page, and the page was wrong, it was an &lt;em&gt;old&lt;/em&gt; version. Then I reloaded and the new version came back!&lt;/p&gt;

&lt;p&gt;I'd &lt;em&gt;added&lt;/em&gt; a running service. If it was originally running version 1, and I deployed version 2: first load would return v1 page, 2nd load would take v2, then it would flip back to v1, v2, v1...&lt;/p&gt;

&lt;p&gt;That was... exciting.&lt;/p&gt;

</description>
      <category>cicd</category>
      <category>devjournal</category>
      <category>devops</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Tests vs Business Value</title>
      <dc:creator>John Mitchell</dc:creator>
      <pubDate>Wed, 17 Dec 2025 15:06:29 +0000</pubDate>
      <link>https://dev.to/johntellsall/tests-vs-business-value-30me</link>
      <guid>https://dev.to/johntellsall/tests-vs-business-value-30me</guid>
      <description>&lt;p&gt;Tests are in a higher level language, much simpler than app code. But, they are NOT free!&lt;/p&gt;

&lt;p&gt;A test which is wrong, or doesn't match business expectations, is worse than no test at all. With no test, you can tell you don't understand how the code works in some situations. With a wrong test, the bad info gets lost among the other tests.&lt;/p&gt;

&lt;p&gt;One time as a DevOps person I tried to help the App Devs with their work. A test failed. But I didn't understand the feature nor code nor the test well enough to know which one to fix! I did &lt;em&gt;NOT&lt;/em&gt; want to "just make it work", that would be much worse than doing nothing. So I did nothing and moved on. I was... very salty that an expert Dev like myself couldn't help a simple code/test problem, but that was the best value I could provide: do nothing, there's definitely a bug, I can't fix it.&lt;/p&gt;

</description>
      <category>discuss</category>
      <category>softwareengineering</category>
      <category>testing</category>
    </item>
    <item>
      <title>Junior Broken Feedback Loop</title>
      <dc:creator>John Mitchell</dc:creator>
      <pubDate>Wed, 17 Dec 2025 15:05:41 +0000</pubDate>
      <link>https://dev.to/johntellsall/junior-broken-feedback-loop-4d9l</link>
      <guid>https://dev.to/johntellsall/junior-broken-feedback-loop-4d9l</guid>
      <description>&lt;p&gt;It's a thinking challenge, not an AI challenge.&lt;/p&gt;

&lt;p&gt;A while back, a junior asked me a question. They wanted to do X, they had code, with error Y. So they searched for it, got a page on Stack Overflow, pasted "the answer", then got a new and different error.&lt;/p&gt;

&lt;p&gt;They:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;didn't understand the original code&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;didn't understand the original error&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is fine. They then searched for the error and found a relevant page.&lt;/p&gt;

&lt;p&gt;This is also fine. However, they:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cut-pasted "an answer" from SO &lt;em&gt;without understanding if it was relevant or not&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The junior was hoping to work with a Puzzle: adding information will gradually give them a solution. In practice they are working with a Mystery: more information makes the task &lt;em&gt;harder&lt;/em&gt; since they can't distinguish between different aspects.&lt;/p&gt;

&lt;p&gt;I focused them on a few relevant details and let them go to it.&lt;/p&gt;

</description>
      <category>discuss</category>
      <category>beginners</category>
      <category>career</category>
      <category>learning</category>
    </item>
    <item>
      <title>&lt;3 RSS for learning</title>
      <dc:creator>John Mitchell</dc:creator>
      <pubDate>Fri, 29 Aug 2025 13:10:57 +0000</pubDate>
      <link>https://dev.to/johntellsall/3-rss-for-learning-720</link>
      <guid>https://dev.to/johntellsall/3-rss-for-learning-720</guid>
      <description>&lt;p&gt;I &lt;em&gt;adore&lt;/em&gt; RSS! Use it literally every single day. I have many feeds on Feedly.com, and add to it every week or two.&lt;/p&gt;

&lt;p&gt;Tip: use a service to stream quality content to your RSS feed reader. For Hacker News, &lt;a href="http://hnapp.com/" rel="noopener noreferrer"&gt;http://hnapp.com/&lt;/a&gt; does the trick for me.&lt;/p&gt;

&lt;p&gt;I subscribe to a couple dozen authors on Hacker News. If someone has great ideas, and writes well, I'm very happy to learn from them.&lt;/p&gt;

&lt;p&gt;Example: in hnapp, search for &lt;code&gt;author:bob1029&lt;/code&gt;, there's an RSS link, paste that into your RSS feed reader to see that person's Hacker News comments.&lt;/p&gt;

&lt;p&gt;I have an entire "Hacker News" section in Feedly, just with author's comments. Very useful!&lt;/p&gt;

</description>
      <category>rss</category>
      <category>learning</category>
      <category>productivity</category>
      <category>discuss</category>
    </item>
    <item>
      <title>WIP notebooks</title>
      <dc:creator>John Mitchell</dc:creator>
      <pubDate>Mon, 14 Jul 2025 13:46:49 +0000</pubDate>
      <link>https://dev.to/johntellsall/wip-notebooks-59ne</link>
      <guid>https://dev.to/johntellsall/wip-notebooks-59ne</guid>
      <description>&lt;p&gt;I have two types of journals for work: a small-ish disc-bound one for time-based task planning, and a second for notes and research.&lt;/p&gt;

&lt;p&gt;The first one is a 5x7 I just got from the dollar store. It turns out they have a bunch that work just fine. Each morning I write out the hours and mark in the meetings I have. I wrote down what I'll say at the Standup (= what I'm working on). During the day I plot each task as it's being done in 15-minute increments, so I have a history of what I'm spending my time on.&lt;/p&gt;

&lt;p&gt;This has worked incredibly well! By tracking my time carefully I take &lt;em&gt;breaks&lt;/em&gt; and celebrate my wins. Before things got mashed together and work could be a drag sometimes.&lt;/p&gt;

&lt;p&gt;The second journal is just a cheap 8.5 x 11 one. It contains: 1) notes "in the moment" when I'm working on something, 2) meeting notes, and 3) research notes.&lt;/p&gt;

&lt;p&gt;Capturing notes on paper makes a &lt;em&gt;big&lt;/em&gt; difference. I focus more, and re-reading it helps cement the info into my brain. If I have questions or unique ideas they get highlighted so I can discuss with the team later or another time.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>AI x Quality</title>
      <dc:creator>John Mitchell</dc:creator>
      <pubDate>Thu, 10 Jul 2025 15:15:19 +0000</pubDate>
      <link>https://dev.to/johntellsall/ai-x-quality-1jhc</link>
      <guid>https://dev.to/johntellsall/ai-x-quality-1jhc</guid>
      <description>&lt;p&gt;&lt;a href="https://martinfowler.com/articles/exploring-gen-ai/i-still-care-about-the-code.html" rel="noopener noreferrer"&gt;https://martinfowler.com/articles/exploring-gen-ai/i-still-care-about-the-code.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I'm a fan of AI. However at the end of the day what we do is build &lt;em&gt;features&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Martin notes that focusing on &lt;em&gt;tests first&lt;/em&gt; is a great way to go in the AI age, and I strongly agree.&lt;/p&gt;

&lt;p&gt;If feature code doesn't have tests, that's fine. As a team we don't pay for writing and maintaining the test. However if we get "bad" tests, that's SO MUCH WORSE than no tests at all!&lt;/p&gt;

&lt;p&gt;If AI writes a test, and a human merges it without thinking, or without thinking enough, that's bad. The entire feature is now at risk. Business and the tech team are lulled into a false sense of safety and security.&lt;/p&gt;

&lt;p&gt;The way we as developers know what and how to test, is by writing tests. We slowly gain experience and more deeply understand the business and their requirements. If we delegate easy tests to AI, we're letting our testing and &lt;em&gt;understanding&lt;/em&gt; skills weaken. We're at risk of not understanding the test, not understanding the code, and not understanding the business requirements.&lt;/p&gt;

&lt;p&gt;Recently a client complained that their test suite had too many "flaky" tests. They were spending a lot of time debugging the tests. Fortunately I had a simple solution: delete flaky tests with prejudice. See my previous post.&lt;/p&gt;

&lt;p&gt;Tests &lt;em&gt;only&lt;/em&gt; value is in critiquing the feature code. It has no other value nor function. Delete it if it's not creating value.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>delete flaky tests with prejudice</title>
      <dc:creator>John Mitchell</dc:creator>
      <pubDate>Fri, 04 Jul 2025 17:07:49 +0000</pubDate>
      <link>https://dev.to/johntellsall/delete-flaky-tests-with-prejudice-k27</link>
      <guid>https://dev.to/johntellsall/delete-flaky-tests-with-prejudice-k27</guid>
      <description>&lt;p&gt;(In response to "What's your biggest challenge in proving your automated tests are truly covering everything important?" on &lt;a href="https://www.reddit.com/r/Terraform/comments/1lrlwfh/whats_your_biggest_challenge_in_proving_your/" rel="noopener noreferrer"&gt;Reddit&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;As a Software Engineer, your job is not writing tests. In fact your job is not writing code. It's delivering &lt;em&gt;features&lt;/em&gt; reliably and quickly. Tests are just one way to prove to yourself, the team, and the business, that the quality is high enough.&lt;/p&gt;

&lt;p&gt;It's a feedback loop.&lt;/p&gt;

&lt;p&gt;The best CICD "pipeline" I've ever used was just a shim which automatically runs the project-based tests. If you run the full suite locally, the pipeline won't do anything surprising and it's just a backstop.&lt;/p&gt;

&lt;p&gt;Learn your test tool very well, with an eye towards narrowing the &lt;em&gt;scope&lt;/em&gt; of tests which run after a code change. This increases the feedback speed.&lt;/p&gt;

&lt;p&gt;If you're doing Python: &lt;code&gt;pytest&lt;/code&gt; has options like "run this test starting with the last-failing test, then continue" which make it stupid simple to have a super fast dev loop. (Please comment on how to do this with your language/tool, I'm curious)&lt;/p&gt;

&lt;p&gt;One tool I use on 100% of my projects is a little thing that runs a script when a file changes. Get to know it and love it, or find a replacement. &lt;a href="https://jvns.ca/blog/2020/06/28/entr/" rel="noopener noreferrer"&gt;https://jvns.ca/blog/2020/06/28/entr/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;My core dev loop is:&lt;br&gt;
1) write a little test with high-level thoughts about the feature&lt;br&gt;
2) write a little code that implements some of the feature&lt;br&gt;
3) execute "run tests when files change" in a terminal&lt;/p&gt;

&lt;p&gt;Then the feedback loop is &lt;em&gt;very&lt;/em&gt; fast: edit the high-level test, save the file to immediately see if it worked. Or, add code to the implementation, save the file to immediately see if it worked.&lt;/p&gt;

&lt;p&gt;Very often I'm not sure about what to do so I put a "drop into debugger" command into the test or code and then rerun the test. It does some stuff then gives me an interactive prompt. I can single-step the code/test, examine variables, even make API calls. So much fun!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Bootstrapping clarification</title>
      <dc:creator>John Mitchell</dc:creator>
      <pubDate>Wed, 04 Jun 2025 14:25:38 +0000</pubDate>
      <link>https://dev.to/johntellsall/bootstrapping-clarification-2j56</link>
      <guid>https://dev.to/johntellsall/bootstrapping-clarification-2j56</guid>
      <description>&lt;p&gt;My reviewers pointed out the "Bootstrapping an Infrastructure in 2025" article could use some clarification.&lt;/p&gt;

&lt;p&gt;The first part of setting up a cluster has these parts:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Version Control - CVS, track who made changes, backout
Gold Server - only require changes in one place
Host Install Tools - install hosts without human intervention
Ad Hoc Change Tools - 'expect', to recover from early or big problems
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;"Version Control" these days is Git.&lt;/p&gt;

&lt;p&gt;"Host Install Tools" are tools so that when a new computer is booted, it's setup with a base operating system, so it can become a functioning member of the cluster. In other words, PXE. In cloud world it's like AMI or Packer or Docker images.&lt;/p&gt;

&lt;p&gt;A "Gold Server" is a server that's central to managing the cluster. Instead of making changes to each individual service machine, an admin registers the change centrally, then lets the cluster make the changes happen. "Ad Hoc Change Tools" is ssh (manual changes) vs the standard path. Ad hoc changes are flexible but dangerous.&lt;/p&gt;

&lt;p&gt;When the paper was written, computers were individual little snowflakes. To fix a database server, you'd connect using ssh to the server, figure out what's wrong, then run commands or edit files on the server to fix the issue. This method is fun, effective, and flexible, but breaks down almost instantly. You don't remember what you changed. Other people can change things randomly, and also forget. The system doesn't crash per se, but mostly works. This is worse. The system works except sometimes it acts really strangely and causes an enormous amount of effort to fix.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;Bootstrapping&lt;/code&gt; paper recommends another way to make changes:&lt;br&gt;
1) setup a change in the central, "gold" server. Example:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;database servers should have "postgres" process running
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;2) from the gold server, trigger some or all other servers to check for changes&lt;/p&gt;

&lt;p&gt;3) when a database server checks the central server, it'll find the "make sure postgres is running" change, and execute that change.&lt;/p&gt;

&lt;p&gt;This has a lot of advantages. The major one is "eventual consistency". Changes eventually make it out to all the correct machines.&lt;/p&gt;

&lt;p&gt;In a medium or large cluster, very often changes fail. The server isn't up, or is too busy, or something else is going on. A centrally-pushed change is applied to only a subset of servers.&lt;/p&gt;

&lt;p&gt;In the "pull" style, each server periodically polls the central gold server for changes. Changes set up once, in the central server, eventually are applied to the appropriate machines.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>I despise bash but...</title>
      <dc:creator>John Mitchell</dc:creator>
      <pubDate>Wed, 04 Jun 2025 13:44:35 +0000</pubDate>
      <link>https://dev.to/johntellsall/i-despise-bash-but-2m9o</link>
      <guid>https://dev.to/johntellsall/i-despise-bash-but-2m9o</guid>
      <description>&lt;p&gt;... use it constantly. It's just so useful.&lt;/p&gt;

&lt;p&gt;Two tips:&lt;/p&gt;

&lt;p&gt;1) first line of ALL SCRIPTS is:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;set -euo pipefail # strict mode
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This makes the script crash so you can fix it if any command gets an error, or a segment of a pipe gets an error. It'll also crash if a variable gets used before being set.&lt;/p&gt;

&lt;p&gt;A program that does the wrong thing then silently continues, is a bad bad program.&lt;/p&gt;

&lt;p&gt;2) rewrite the script in a real language (Python?) if it has more than 3 conditionals or loops.&lt;/p&gt;

&lt;p&gt;Peronally I find conditionals to be do-able in Bash, but loops tend to be problematic.&lt;/p&gt;

&lt;p&gt;I've written thousands of lines of Perl and Awk and other things in my day, but Bash and Python cover 100% of my work these days.&lt;/p&gt;

&lt;p&gt;BONUS:&lt;/p&gt;

&lt;p&gt;3) &lt;code&gt;set -o xtrace&lt;/code&gt; also known as &lt;code&gt;set +x&lt;/code&gt; is also great.&lt;/p&gt;

&lt;p&gt;Print each command before it's executed, making code run really obvious. We love obvious.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Bootstrapping an Infrastructure in 2025</title>
      <dc:creator>John Mitchell</dc:creator>
      <pubDate>Tue, 03 Jun 2025 14:16:35 +0000</pubDate>
      <link>https://dev.to/johntellsall/bootstrapping-an-infrastructure-in-2025-4bpc</link>
      <guid>https://dev.to/johntellsall/bootstrapping-an-infrastructure-in-2025-4bpc</guid>
      <description>&lt;h1&gt;
  
  
  Notes on "Bootstrapping an Infrastructure"
&lt;/h1&gt;

&lt;p&gt;My job is Cloud Ops at a large media company. We're moving a ton of users and other resources to a new cloud tenant. I enjoyed the opportunity to re-visit the classic paper &lt;a href="https://www.usenix.org/conference/lisa-98/bootstrapping-infrastructure" rel="noopener noreferrer"&gt;Bootstrapping an Infrastructure&lt;/a&gt; by Steve Traugott and Joel Huddleston, published all the way back in 1998. They compare booting a &lt;em&gt;cluster&lt;/em&gt; to booting a &lt;em&gt;computer&lt;/em&gt; - each is composed of a large set of services, each one supporting the following ones.&lt;/p&gt;

&lt;h2&gt;
  
  
  Steps
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4px4yqqze4mcopasxlct.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4px4yqqze4mcopasxlct.png" alt="bootstrap diagram" width="800" height="184"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.mermaidchart.com/app/projects/87c361a0-23ad-447c-9eff-6206afd56792/diagrams/0bac8fc6-634a-4c80-807d-49bf6477b2f9/version/v0.1/edit" rel="noopener noreferrer"&gt;bootstrap diagram&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary of paper
&lt;/h2&gt;

&lt;p&gt;They model a cloud made of many machines, as a single machine, not as a collection of "pet" computers.&lt;br&gt;
By following a specific series of steps, each one supporting the others, a single cloud is constructed.&lt;/p&gt;

&lt;p&gt;The paper has a whole section on "Infrastructure Thinking":&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Providing capable, reliable infrastructures which grant easy access to applications makes users happier and tends to raise the sysadmin's quality of life.&lt;/code&gt;&lt;br&gt;
&lt;code&gt;&lt;br&gt;
The "virtual machine" concept simplified how we maintained individual hosts. Upon adopting this mindset, it immediately became clear that **all nodes in a "virtual machine" infrastructure needed to be generic, each providing a commodity resource to the infrastructure.** It became a relatively simple operation to add, delete, or replace any node.&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Commentary
&lt;/h2&gt;

&lt;p&gt;The 16 steps are in four layers, each one building atop the layers that came before. Each layer focuses on a single audience, and delivers a specific feature in the cluster to that audience.&lt;/p&gt;

&lt;p&gt;The four layers are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Infrastructure&lt;/li&gt;
&lt;li&gt;Support&lt;/li&gt;
&lt;li&gt;Client Hosts&lt;/li&gt;
&lt;li&gt;Cluster services&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Infrastucture
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Version Control - CVS, track who made changes, backout
Gold Server - only require changes in one place
Host Install Tools - install hosts without human intervention
Ad Hoc Change Tools - 'expect', to recover from early or big problems
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;These tools support the cluster management, and are designed for the cluster admins only. Like all layers, they support the layers above.&lt;/p&gt;

&lt;p&gt;Coming from a modern/cloud perspective, this is very familiar and very different. Version control and central "server" makes sense. Host Install means PXE: machine boots, asks central server what OS distribution and customization to install, and does that over minutes/hours. The modern equivalent would be an AWS AMI (machine image) or Hashicorp Packer or Docker image.&lt;/p&gt;

&lt;p&gt;This is great: start with nothing, install a full blob all at once. If it doesn't work as expected, iterate. Tweaking individual machines is fine for experimentation but acknowledges that local data is &lt;em&gt;ephemeral&lt;/em&gt; and will be reset soon.&lt;/p&gt;

&lt;h3&gt;
  
  
  Support
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Directory Servers - DNS, NIS, LDAP
Authentication Servers - NIS, Kerberos
Time Synchronization - NTP
Network File Servers - NFS, AFS, SMB
File Replication Servers - SUP
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;These services provide low-level data to the cluster and to users. DNS (cluster-wide network names) and LDAP (~ shared user, printer, other resource info) provide trusted low-level data to the cluster. Authorization requires &lt;em&gt;two way&lt;/em&gt; trust: a server only allows access if person knows a secret password. NFS (Network File System) provides raw storage to the cluster, to be used by higher-level services.&lt;/p&gt;

&lt;p&gt;The Support services make different types of data available to the cluster.&lt;/p&gt;

&lt;p&gt;Surprise: the concept of "replication", where data is centrally managed but then &lt;em&gt;copied&lt;/em&gt; to local machines, isn't something I've seen much. I guess it makes sense. In the world of physical machines, being able to provide apps to local users even if the network is gone, is a great idea.&lt;/p&gt;

&lt;h3&gt;
  
  
  Client
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client File Access - automount, AMD, autolink
Client OS Update - rc.config, configure, make, cfengine
Client Configuration Management - cfengine, SUP, CVSup
Client Application Management - autosup, autolink
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Unlike modern cloud, the paper talks about apps running locally on each physical machine.&lt;/p&gt;

&lt;p&gt;The Client services manage app support at the single, cluster level:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Automount: each machine makes available specific parts of the shared network file system for the local user(s) and services.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;OS Update: machine operating systems are managed centrally.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Client Configuration and Application Management: at this layer, individual machine differences are managed centrally. If a user wants to make a local change, it's setup and managed centrally, so the entire machine can be replaced without concern.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cluster Services
&lt;/h3&gt;

&lt;p&gt;Mail and Printing: these are user-level services that are managed and maintained centrally, but are understandable and directly usable by the end users.&lt;/p&gt;

&lt;p&gt;Monitoring: Another cluster-level service, this one's audience is the admins themselves&lt;/p&gt;

&lt;h2&gt;
  
  
  More Commentary
&lt;/h2&gt;

&lt;p&gt;Reading this paper from the late 1990s was enlightening and also surreal. Many details have changed (Perl! Cfengine! &lt;em&gt;brrrrrr&lt;/em&gt;), but the overall flow of the ideas is 100% solid. In the modern world, when a new cloud provider or "tenant" is onboarded, the sequence of layers is extremely similar.&lt;/p&gt;

&lt;p&gt;Surprise: of the dozens of tools/services mentioned only &lt;em&gt;two&lt;/em&gt; are still in common use: DNS and NTP. And admins &lt;em&gt;still&lt;/em&gt; love to complain about DNS breaking things.&lt;/p&gt;

&lt;p&gt;Surprise: the authors didn't divide things into layers, nor did they mention "audience" except for "client" as in user-facing apps.&lt;/p&gt;

&lt;p&gt;Surprise: no security services! I guess a Web Application Firewall would be a big ask in the 1990s, but a central "these things are happening on those machines by these users" service would be valuable. E.g. AWS CloudTrail or CloudWatch Logs or Splunk.&lt;/p&gt;

&lt;p&gt;Similarly, no app dev services: application logs or traceback collectors. ~ New Relic, Datadog. Years ago as a dev we &lt;em&gt;lived&lt;/em&gt; by our Sentry app showing us where our app was crashing.&lt;/p&gt;

&lt;p&gt;Surprise: authors put "cluster monitoring" at the very &lt;em&gt;end&lt;/em&gt; of the process. They mentioned never getting around to central logging! This was shocking to me: they spend a huge amount of time controlling each layer, without the support of cluster-level feedback mechanisms. "Cluster Admin" is an important audience. Cluster-wide services can be divided into "Infra" (for the admins), or "Common" (for end users). Central logging and networking and security services are "Infra", CICD pipelines are "Common".&lt;/p&gt;

&lt;p&gt;I study and teach Feedback Loops. The central idea is: 1) make a change, 2) receive feedback, 3) adjust the next change loop based on feedback. Presumably the authors would ssh into each machine, make a change from the central server, then watch on the local machine what happened. This is fine: it's easy to get multiple high quality logs and other data locally. However some problems only show up at the cluster level, over larger time scales.&lt;/p&gt;

&lt;p&gt;Developers talk about "Test Driven Development". Instead of developing a feature by writing code, a feature is developed by 1) writing a test which fails, 2) writing "just enough" feature code to get the test to pass, then 3) refactor the test and code. Tests are an &lt;em&gt;artifact&lt;/em&gt; that require investment but give value &lt;em&gt;forever&lt;/em&gt;. Test automation gives devs and the business the &lt;em&gt;confidence&lt;/em&gt; that new changes don't break business-critical features.&lt;/p&gt;

&lt;p&gt;For a cluster (or cloud tenant), this helps tremendously. Build the cluster-wide feedback services &lt;em&gt;first&lt;/em&gt;. This gives rapid, reliable, actionable feedback to the whole boostrap process.&lt;/p&gt;

&lt;p&gt;Test Driven Development hasn't reached all of the Cloud / DevOps world for some reason. Maybe it's time for me to publish more articles and videos...&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
