<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: memattchung</title>
    <description>The latest articles on DEV Community by memattchung (@memattchung).</description>
    <link>https://dev.to/memattchung</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F632909%2F3d516c01-9947-4ed6-91b8-2249da7ab0ea.png</url>
      <title>DEV Community: memattchung</title>
      <link>https://dev.to/memattchung</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/memattchung"/>
    <language>en</language>
    <item>
      <title>CloudWatch Metrics: Stop averaging, start percentiling</title>
      <dc:creator>memattchung</dc:creator>
      <pubDate>Sat, 17 Sep 2022 18:07:47 +0000</pubDate>
      <link>https://dev.to/memattchung/cloudwatch-metrics-stop-averaging-start-percentiling-3gmi</link>
      <guid>https://dev.to/memattchung/cloudwatch-metrics-stop-averaging-start-percentiling-3gmi</guid>
      <description>&lt;p&gt;AWS CloudWatch is a corner service used by almost all AWS Service teams for monitoring and scaling software systems. Though it is a foundational software service that most businesses could benefit from, CloudWatch’s features are unintuitive and therefore often overlooked. &lt;/p&gt;

&lt;p&gt;Out of the box, CloudWatch offers users the ability to plot both standard infrastructure and custom application metrics. However, new users can easily make the fatal mistake of plotting their graphs using the default statistic: average. Stop right there! Instead of averages, use percentiles. By switching the statistic type, you are bound to uncover operational issues that have been hiding right underneath your nose.&lt;/p&gt;

&lt;p&gt;In this post, you’ll learn:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;About the averages that can hide performance issues&lt;/li&gt;
&lt;li&gt;Why software teams favor percentiles&lt;/li&gt;
&lt;li&gt;How percentiles are calculated.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Example scenario: Slowness hiding in plain sight
&lt;/h2&gt;

&lt;p&gt;Imagine the following scenario between a product manager, A, and an engineer, B, both of them working for SmallBusiness.&lt;/p&gt;

&lt;p&gt;A sends B a slack message, alerting B that customers are reporting slowness with CoffeeAPI:&lt;/p&gt;

&lt;p&gt;A: “Hey — some of our customers are complaining. They’re saying that CoffeeAPI is slower than usual”.&lt;/p&gt;

&lt;p&gt;B: “One second, taking a look…”&lt;/p&gt;

&lt;p&gt;B signs into the AWS Console and pulls up the CloudWatch dashboard. Once the page loads,  he scrolls down to the specific graph that plots CoffeeAPI latency, execution_runtime_in_ms&lt;/p&gt;

&lt;p&gt;He quickly reviews the graph for the relevant time period, the last 24 hours.&lt;/p&gt;

&lt;p&gt;There’s no performance issue, or so it seems. Latencies sit below the team defined threshold, all data points below the 600 milliseconds threshold:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--QXVotwwi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattchung.me/wp-content/uploads/2022/09/CleanShot-2022-09-16-at-05.02.19.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--QXVotwwi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattchung.me/wp-content/uploads/2022/09/CleanShot-2022-09-16-at-05.02.19.png" alt="Plotting the average execution time in milliseconds" width="880" height="186"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;B: “Um…Look good to me” B reports back.&lt;/p&gt;

&lt;p&gt;A: “Hmm…customers are definitely saying the system takes as long as 900ms…”&lt;/p&gt;

&lt;h2&gt;
  
  
  Switching up the statistic from avg to p90
&lt;/h2&gt;

&lt;p&gt;In B’s mind, he has a gut feeling that something’s off — something isn’t adding up. Are customers misreporting issues?&lt;/p&gt;

&lt;p&gt;Second guessing himself, B modifies the line graph, duplicating the &lt;code&gt;execution_runtine_in_ms&lt;/code&gt; metric. He tweaks one setting -under the &lt;strong&gt;statistic&lt;/strong&gt; field, he swaps out Average for P90.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--PCQUudVC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattchung.me/wp-content/uploads/2022/09/CleanShot-2022-09-16-at-04.59.20.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--PCQUudVC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattchung.me/wp-content/uploads/2022/09/CleanShot-2022-09-16-at-04.59.20.png" alt="Duplicating the metric and changing statistic to P90" width="880" height="145"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;He refreshes the page and boom — there it is: datapoints revealing latency above 600 milliseconds!&lt;/p&gt;

&lt;p&gt;Some customers’ requests are even taking as long as 998 milliseconds, 300+ milliseconds above the team’s defined service level operation (SLO).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--GbZ_8ryr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattchung.me/wp-content/uploads/2022/09/CleanShot-2022-09-16-at-05.01.25.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--GbZ_8ryr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattchung.me/wp-content/uploads/2022/09/CleanShot-2022-09-16-at-05.01.25.png" alt="P90 comparison" width="880" height="186"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Problematic averages
&lt;/h2&gt;

&lt;p&gt;Using CloudWatch metrics may seem simple at first. But it’s not that intuitive. What’s more is that by default, CloudWatch plots metrics with the average as the default statistic. As we saw above, this can hide outliers.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Plans based on assumptions about average conditions usually go wrong.&lt;/p&gt;

&lt;p&gt;Sam Savage&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For any given metric with multiple data points, the average may show no change in behavior throughout the day, when really, there are significant changes.&lt;/p&gt;

&lt;p&gt;Here’s another example: let’s say we want to measure the number of requests per second.&lt;/p&gt;

&lt;p&gt;Sounds simple,right? Not so fast.&lt;/p&gt;

&lt;p&gt;First we need to talk measurements. Do we measure once a second, or by averaging requests over a minute? As we have already discovered, averaging requests can hide higher latencies that arrive in small bursts. Let’s consider a 60 second period as an example. If during the first 30 seconds there are 200 requests per second, and during the last 30 seconds there are zero requests per second, then the average would be 100 requests per second. However, in reality, the “instantaneous load” is twice that amount if there are 200 requests/s in odd-numbered seconds and 0 in others. &lt;/p&gt;

&lt;h2&gt;
  
  
  How to use Percentiles
&lt;/h2&gt;

&lt;p&gt;Using percentiles makes for smoother software.&lt;/p&gt;

&lt;p&gt;Swapping out average for percentile is advantageous for two reasons: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;metrics are not skewed by outliers and just as important&lt;/li&gt;
&lt;li&gt;every percentile data is an actual user experience, not a computed value like average&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Continuing with the above example of a metric that tracks execution time, imagine an application publishing the following data points:&lt;/p&gt;

&lt;p&gt;[535, 400, 735, 999, 342, 701, 655, 373, 248, 412]&lt;/p&gt;

&lt;p&gt;If you average the above data, it comes out to 540 milliseconds, yet for the P90, we get 999 milliseconds. Here’s how we arrived at that number:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--IQXLGkP7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattchung.me/wp-content/uploads/2022/09/p90-calculation-05.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--IQXLGkP7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattchung.me/wp-content/uploads/2022/09/p90-calculation-05.png" alt="How to calculate the P90" width="880" height="660"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let’s look at the above graphic in order to calculate the p90. First, start with sorting all the data points for a given time period, sorting them in ascending order from lowest to highest. Next, split the data points into two buckets.  If you want the P90, you split the first 90% of data points into bucket one, and the remaining 10% into bucket two. Similarly, if you want the P50 (i.e. the median), assign 50% of the data points to the first bucket and 50% into the second.&lt;/p&gt;

&lt;p&gt;Finally, after separating the data points into the two buckets, you select the first datapoint in the second bucket. The same steps can be applied to any percentile (e.g. P0, P50, P99).&lt;/p&gt;

&lt;p&gt;Some common percentiles that you can use are p0, p50, p90, p99 and  p99.9. You’ll want to use different percentiles for different alarm thresholds (more on this in an upcoming blog post). Say you are exploring CPU utilization, the p0, p50, and p100 give you the lowest usage, medium usage, and highest usage, respectively.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;To conclude, let’s make sure that you’re using percentiles instead of averages so that when you use CloudWatch, you aren’t getting false positives.&lt;/p&gt;

&lt;p&gt;Take your existing graphs and switch over your statistics from average to percentile today, and start uncovering hidden operational issues. Let me know if you make the change and how it positively impacts your systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Get my tutorials delivered straight to your inbox and sign up for my newsletter by &lt;a href="https://mattchung.me/"&gt;clicking here&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  References
&lt;/h1&gt;

&lt;p&gt;Chris Jones. “Google – Site Reliability Engineering.” Accessed September 12, 2022. &lt;a href="https://sre.google/sre-book/service-level-objectives/"&gt;https://sre.google/sre-book/service-level-objectives/&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Smith, Dave. “How to Metric.” Medium (blog), September 24, 2020. &lt;a href="https://medium.com/@djsmith42/how-to-metric-edafaf959fc7"&gt;https://medium.com/@djsmith42/how-to-metric-edafaf959fc7&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>aws</category>
      <category>cloudskills</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Writing data to disk: transforming brittle code to robust code with atomic writes</title>
      <dc:creator>memattchung</dc:creator>
      <pubDate>Fri, 20 Aug 2021 03:13:34 +0000</pubDate>
      <link>https://dev.to/memattchung/writing-data-to-disk-transforming-brittle-code-to-robust-code-with-atomic-writes-5e3e</link>
      <guid>https://dev.to/memattchung/writing-data-to-disk-transforming-brittle-code-to-robust-code-with-atomic-writes-5e3e</guid>
      <description>&lt;p&gt;&lt;em&gt;This is the first post in a series where I'll cover about writing robust code that's can tolerate both expected and unexpected failures&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem identification
&lt;/h2&gt;

&lt;p&gt;Receiving feedback through code reviews is one of the many ways to grow your career as a software developer. But of course, not all feedback hold the same value. Not so useful comments tend to focus on nit picking (e.g. white space); moderately useful comments detect logic or semantic bugs; fairly useful ones help you see problems through a different lens; the &lt;strong&gt;best comments&lt;/strong&gt; open your eyes to issues that you didn't even know existed.&lt;/p&gt;

&lt;p&gt;One of the most eye-opening code reviews I submitted during my tenure at Amazon Web Service (AWS) revealed to me the importance of &lt;strong&gt;atomic writes to disk.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Example: Brittle Non-Atomic write to disk
&lt;/h2&gt;

&lt;p&gt;Let's take a look at the snippet of Python code below that writes data to disk.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;dataset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fetch_data&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'customers.txt'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;fh&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;each&lt;/span&gt; &lt;span class="n"&gt;customer&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;fh&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At a glance, the above code looks and smells 👃 okay. It's coded with idiomatic Python: the context manager (i.e. &lt;code&gt;with open&lt;/code&gt;) cleans up lingering resources for you, automatically closing out the file handle. Awesome. I see code like this &lt;a href="https://github.com/search?l=Python&amp;amp;q=%22with+open%22+%22write%22&amp;amp;type=Code"&gt;all the time&lt;/a&gt;. But, can you spot the issue?&lt;/p&gt;

&lt;p&gt;The lack of atomicity?&lt;/p&gt;

&lt;h2&gt;
  
  
  What is an atomic write?
&lt;/h2&gt;

&lt;p&gt;In general, an atomic operation is all or nothing, binary, 0 or 1; the operation has either 1) not yet started or 2) has completed successfully. No gray areas. In the context of writing data to disk, the destination &lt;strong&gt;must&lt;/strong&gt; contain all the data we expect to be present in the file, non-corrupted. Not some of the data — all of it.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;So how do transform the above code such that we atomically write to disk?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;As is stands, the above code is brittle, susceptible to failures. What happens if the program raises an exception mid-write? Or if the server powers off in between one of the read or write operations, leaving the data corrupted?  In other words, the code opens us up to leaving the file in an &lt;strong&gt;unknown state&lt;/strong&gt;. &lt;/p&gt;

&lt;h2&gt;
  
  
  Atomically writing a file
&lt;/h2&gt;

&lt;p&gt;Here's how we go about writing an atomic file.&lt;/p&gt;

&lt;h3&gt;
  
  
  Steps
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Create a temporary file&lt;/li&gt;
&lt;li&gt;Write contents to temporary file&lt;/li&gt;
&lt;li&gt;Flush buffers&lt;/li&gt;
&lt;li&gt;Sync to disk&lt;/li&gt;
&lt;li&gt;Rename file. &lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Example
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;x&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TempFile&lt;/span&gt;
&lt;span class="c1"&gt;# 1. create temporary file
&lt;/span&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tempfile&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;fh&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# 2. Write contents to file handle
&lt;/span&gt;    &lt;span class="n"&gt;fh&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
    &lt;span class="c1"&gt;# 3. Flush from any runtime or OS buffers
&lt;/span&gt;    &lt;span class="n"&gt;fh&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="c1"&gt;# 4. Sync from memory to disk
&lt;/span&gt;    &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fh&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fileno&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; 

&lt;span class="c1"&gt;# 5. Rename and replace destination file
&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tempFile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"customers.txt"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We start the procedure with opening a temporary file; this temporary file becomes the intermediate destination in which we direct our writes. By writing to a temporary file, we leave the ultimate destination file (if it exists) in tact, only replacing the destination file if &lt;strong&gt;all the data has been successfully written to the temporary file&lt;/strong&gt;. Once all writes finished, then we simply rename the temporary file to that of the destination file, an atomic operation in itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Above, I demonstrated one way to apply atomicity. This principle can be applied to many other situations. For example, if you are writing multi-threaded code and accessing shared memory, a thread needs to atomically obtain a lock before modifying the underlying shared data structures.&lt;/p&gt;

&lt;p&gt;So, moving forward, when writing or reviewing code, keep the possibility of failures at the fore front of your mind and identify ways you can apply the principle of atomicity to turn fragile code into robust software.&lt;/p&gt;

&lt;h2&gt;
  
  
  Let's Connect
&lt;/h2&gt;

&lt;p&gt;Let's connect and talk more about software and devops. Follow me on Twitter: &lt;a href="https://twitter.com/memattchung"&gt;@memattchung&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;p&gt;&lt;a href="http://stupidpythonideas.blogspot.com/2014/07/getting-atomic-writes-right.html"&gt;Stupid Python Ideas: Getting atomic writes right&lt;/a&gt;&lt;br&gt;
&lt;a href="https://stackoverflow.com/questions/2333872/how-to-make-file-creation-an-atomic-operation"&gt;python - How to make file creation an atomic operation? - Stack Overflow&lt;/a&gt;&lt;/p&gt;

</description>
      <category>programming</category>
      <category>python</category>
      <category>codequality</category>
      <category>devops</category>
    </item>
    <item>
      <title>Monitoring Systems with Canaries</title>
      <dc:creator>memattchung</dc:creator>
      <pubDate>Sun, 20 Jun 2021 14:57:24 +0000</pubDate>
      <link>https://dev.to/memattchung/monitoring-systems-with-canaries-5fjh</link>
      <guid>https://dev.to/memattchung/monitoring-systems-with-canaries-5fjh</guid>
      <description>&lt;p&gt;You launched your service and rapidly onboarding customers. You're moving fast, repeatedly deploying one new feature after another. But with the uptick in releases, bugs are creeping in and you're finding yourself having to troubleshoot, rollback, squash bugs, and then redeploy changes. Moving fast but breaking things. What can you do to quickly detect issues — before your customers report them?&lt;/p&gt;

&lt;p&gt;Canaries.&lt;/p&gt;

&lt;p&gt;In this post, you'll learn about the concept of canaries, example code, best practices, and other considerations including both maintenance and financial implications with running them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is a canary
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Fib1soaM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bdhi6m30jgb3ppx5zmec.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Fib1soaM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bdhi6m30jgb3ppx5zmec.jpg" alt="Source: grass-lifeisgood/Shutterstock"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Back in early 1900s, canaries were used by miners for &lt;a href="https://www.smithsonianmag.com/smart-news/story-real-canary-coal-mine-180961570/"&gt;detecting carbon monoxide and other dangerous gases&lt;/a&gt;. Miners would bring their canaries down with them to the coalmine and when their canary stopped chirping, it was time for the everyone to immediately evacuate. &lt;/p&gt;

&lt;p&gt;In the context of computing systems, canaries perform end-to-end testing, aiming to exercise the entire software stack of your application: they behave like your end-users, emulating customer behavior. Canaries are just pieces of software that are always running and constantly monitoring the state of your system; they emit metrics into your monitoring system (more discussion on monitoring in a separate post), which then triggers an alarm when some defined threshold breaches.&lt;/p&gt;

&lt;h3&gt;
  
  
  What do canaries offer?
&lt;/h3&gt;

&lt;p&gt;Canaries answer the question: "Is my service running?" More sophisticated canaries can offer a deeper look into your service. Instead of canaries just emitting a binary 1 or 0 — up or down — they can be designed such that they emit more meaningful metrics that measure latency from the client's perspective.&lt;/p&gt;

&lt;h2&gt;
  
  
  First steps with building your canary
&lt;/h2&gt;

&lt;p&gt;If you don't have any canaries running that monitor your system, you don't necessarily have to start with rolling your own. Your first canary can require little to no code. One way to gain immediate visibility into your system would be to use synthetic monitoring services such as &lt;a href="https://betteruptime.com/?ref=7hzm"&gt;BetterUptime&lt;/a&gt; or PingDom or StatusCake. These services offer a web interface, allowing you to configure HTTP(s) endpoints that their canaries will periodically poll. When their systems detect an issue (e.g. TCP connection failing, bad HTTP response), they can send you email or text notifications.&lt;/p&gt;

&lt;p&gt;Or if your systems are deployed in Amazon Web Services, you can write Python or Node scripts that integrate with CloudWatch (&lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Synthetics_Canaries_Create.html"&gt;click here&lt;/a&gt; for Amazon CloudWatch documentation).&lt;/p&gt;

&lt;p&gt;But if you are interested in developing your own custom canaries that do more than a simple probe, read on.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where to begin
&lt;/h3&gt;

&lt;p&gt;Remember, canaries should behave just like real customers. Your customer might be a real human being or another piece of software. Regardless of the type of customer, you'll want to start simple.&lt;/p&gt;

&lt;p&gt;Similar to the managed services describe above, your first canary should start with emitting a simple metric into your monitoring system, indicating whether the endpoint is up or down. For example, if you have a web service, perform a vanilla HTTP GET. When successful, the canary will emit &lt;code&gt;http_get_homepage_success=1&lt;/code&gt;  and under failure, &lt;code&gt;http_get_homepage_success=0&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example canary - monitoring cache layer
&lt;/h3&gt;

&lt;p&gt;Imagine you have a simple key/value store system that serves as a caching layer. To monitor this layer, every minute our canary will: 1) perform a write 2) perform a read 3) validate the response.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

    &lt;span class="n"&gt;successful_run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;put_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cache_put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'foo'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'bar'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;write_successful&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;put_response&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;'OK'&lt;/span&gt;
        &lt;span class="n"&gt;Publish_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'cache_engine_successful_write'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;write_successful&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cache_get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'foo'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;successful_read&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'bar'&lt;/span&gt;
        &lt;span class="n"&gt;publish_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'cache_engine_successful_read'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;is_successful_read&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;canary_successful_run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

    &lt;span class="n"&gt;Except&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;log_exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Canary failed due to error: %s"&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;Finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;Publish_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'cache_engine_canary_successful_run'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;successful_run&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;sleep_for_in_seconds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;
        &lt;span class="n"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sleep_for_in_seconds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Cache Engine failure during deployment
&lt;/h4&gt;

&lt;p&gt;With this canary in place emitting metrics, we might then choose to integrate the canary with our code deployment pipeline. In the example below, I triggered a code deployment (riddled with bugs) and the canary detected an issue, triggering an automatic rollback:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Lr893iwa--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/953w7dxvwboqj8f6usve.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Lr893iwa--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/953w7dxvwboqj8f6usve.png" alt="Canary detecting failures"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;p&gt;The above code example was very unsophisticated and you'll want to keep the following best practices in mind:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The canaries should NOT interfere with real user experience&lt;/strong&gt;. Although a good canary should test different behaviors/states of your system, they should in no way interfere with the real user experience. That is, their side effects should be self contained.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;They should always be on, always running, and should be testing at a regular intervals&lt;/strong&gt;. Ideally, the canary runs frequently (e.g. every 15 seconds, every 1 minute).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The alarms that you create when your canary reports an issue should only trigger off more than one datapoint&lt;/strong&gt;. If your alarms fire off on a single data point, you increase the likelihood of false alarms, engaging your service teams unnecessarily. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integrate the canary into your continuous integration/continuous deployment pipeline&lt;/strong&gt;. Essentially, the deployment system should monitor the metrics that the canary emits and if an error is detected for more then N minutes, the deployment should automatically roll back (more of safety of automated rollbacks in a separate post)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When rolling your own canary, do more than just inspect the HTTP headers&lt;/strong&gt;. Success criteria should be more than verifying that the HTTP status code is a 200 OK. If your web services returns payload in the form of JSON, analyze the payload and verify that it's both syntactically and semantically correct.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Cost of canaries
&lt;/h2&gt;

&lt;p&gt;Of course, canaries are not free. Regardless of whether or not you rely on a third party service or roll your own, you'll need to be aware of the maintenance and financial costs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Maintenance
&lt;/h3&gt;

&lt;p&gt;A canary is just another piece of software. The underlying implementation may be just few bash scripts cobbled together or full blown client application. In either case, you need to maintain them just like any other code package.&lt;/p&gt;

&lt;h3&gt;
  
  
  Financial Costs
&lt;/h3&gt;

&lt;p&gt;How often is the canary running? How many instances of the canary are running? Are they geographically distributed to test from different locations? These are some of the questions that you must ask since they impact the cost of running them. &lt;/p&gt;

&lt;h2&gt;
  
  
  Beyond canaries
&lt;/h2&gt;

&lt;p&gt;When building systems, you want a canary that behaves like your customer, one that allows you to quickly detect issues as soon as your service(s) chokes. If you are vending an API, then your canary should exercise the different URIs. If you testing the front end, then your canary can be programmed mimic a customer using a browser using libraries such as selenium.&lt;/p&gt;

&lt;p&gt;Canaries are a great place to start if you are just launching a service. But there's a lot more work required to create an operationally robust service. You'll want to inject failures into your system. You'll want a crystal clear understanding of how your system should behave when its dependencies fail. These are some of the topics that I'll cover in the next series of blog posts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Let's Connect
&lt;/h2&gt;

&lt;p&gt;Let's connect and talk more about software and devops. Follow me on Twitter: &lt;a href="https://twitter.com/memattchung"&gt;@memattchung&lt;/a&gt;&lt;/p&gt;

</description>
      <category>programming</category>
      <category>cloudskills</category>
      <category>devops</category>
    </item>
    <item>
      <title>3 Tips on getting eyeballs on your code review</title>
      <dc:creator>memattchung</dc:creator>
      <pubDate>Mon, 14 Jun 2021 15:36:49 +0000</pubDate>
      <link>https://dev.to/memattchung/3-tips-on-getting-eyeballs-on-your-code-review-735</link>
      <guid>https://dev.to/memattchung/3-tips-on-getting-eyeballs-on-your-code-review-735</guid>
      <description>&lt;p&gt;"Why is nobody reviewing my code?"&lt;/p&gt;

&lt;p&gt;I sometimes witness new engineers (or even seasoned engineers new to the company) submit code reviews that end up sitting idle, gaining zero traction. Often, these code reviews get published but comments never flow in, leaving the developer left scratching their head, wondering why nobody seems to be taking a look. To help avoid this situation, check out the 3 tips below for more effective code reviews.&lt;/p&gt;

&lt;h2&gt;
  
  
  3 tips for more effective code reviews
&lt;/h2&gt;

&lt;p&gt;Try out the three tips for more effective code reviews. In short, you should:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Assume nobody cares&lt;/li&gt;
&lt;li&gt;Strive for bite sized changes&lt;/li&gt;
&lt;li&gt;Add a descriptive summary&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  1. Assume nobody cares
&lt;/h3&gt;

&lt;p&gt;After you hit the publish button, don't expect other developers to flock to your code review. In fact, it's safe to assume that nobody cares. I know, that sounds a bit harsh but as Neil Strauss suggests:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Your challenge is to assume — to count on — the completely apathy of the reader. And from there, make them interested.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;At some point in our careers, we all fall into this trap.&lt;/p&gt;

&lt;p&gt;We send out a review, one that lacks a clear description (see section below “Add a descriptive summary”) and then the code review would sometimes sits there, patiently waiting for someone to sprinkle comments. Sometimes, those comments never come.&lt;/p&gt;

&lt;p&gt;Okay, it's not that people don't necessary care. It has more to do with the fact people are busy, with their own tasks and deliverable. They too are writing code that they are trying to ship.&lt;/p&gt;

&lt;p&gt;So your code review essentially pulls them away from delivering their own work. So, make it as easy as possible for them to review.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One way to do gain their attention is simply by giving them a heads up.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before publishing your code review, send them an instant message or e-mail, giving them a heads up. Or if you are having a meeting with that person, tell them that you plan on sending out a code review and ask them if they can take a look at the code review. This puts your code review on their radars. And if you don't see traction in an appropriate (which varies, depending on change and criticality), then follow up with them.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Strive for bite sized code reviews
&lt;/h3&gt;

&lt;p&gt;Anything change beyond than 100-200 lines of code requires a significant amount of mental energy (unless the change itself is a trivial updates to comments or formatting). So how can you make it easier for your reviewer?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Aim for small, bite sized code reviews.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In my experience, a good rule of them is submit less than 100 lines of code. &lt;/p&gt;

&lt;p&gt;What if there’s no way your change can squeeze into double digits?&lt;/p&gt;

&lt;p&gt;Then consider breaking down the single code review into multiple, smaller sized code reviews and once all those independent code reviews are approved, submit a single code review that merges all those changes in atomically.&lt;/p&gt;

&lt;p&gt;And if you still cannot break down a large code review into these lengths and find that it’s unavoidable to submit a large code review, then make sure you &lt;strong&gt;schedule a 15-30 minute meeting to discuss your large code review&lt;/strong&gt; (I’ll create a separate blog post for this).&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Add a descriptive summary for the change
&lt;/h3&gt;

&lt;p&gt;I’m not suggesting you write a miniature novel when adding a description to your code review. But you’ll definitely need to write something with more substance than a one-liner: “Adds new module”. Rob Pike put’s it succinctly and his criteria for a good description includes “What, why, and background”.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--zfmgMwKI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/w154xllxum2cypl7bf8k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zfmgMwKI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/w154xllxum2cypl7bf8k.png" alt="Good example of a change summary from IntelDPDK package" width="880" height="296"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In addition to adding this criteria, be sure to describe how you tested your code — or, better yet, ship your code review with unit tests. Brownie points if you explicitly call out what is out of scope. Limiting your scope reduces the possibility of unnecessary back-and-forth comments for a change that falls outside your scope.&lt;/p&gt;

&lt;p&gt;Finally, if you want some stricter guidelines on how to write a good commit message, you might want to check out Kabir Nazir’s blog post on &lt;a href="https://ac-blog.vercel.app/blog/how-to-write-good-git-commit-message"&gt;“How to write good commit messages."&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;If you are having trouble with getting traction on your code reviews, try the above tips. Remember, it's on you, the submitter of the code review, to make it as easy as possible for your reviews to leave comments (and approve).&lt;/p&gt;

&lt;h3&gt;
  
  
  Let's Connect
&lt;/h3&gt;

&lt;p&gt;Let's chat more and connect! Follow me on Twitter: &lt;a class="mentioned-user" href="https://dev.to/memattchung"&gt;@memattchung&lt;/a&gt;&lt;/p&gt;

</description>
      <category>programming</category>
      <category>codereview</category>
      <category>beginners</category>
      <category>career</category>
    </item>
    <item>
      <title>3 project management tips for the Well-Rounded Software Developer</title>
      <dc:creator>memattchung</dc:creator>
      <pubDate>Wed, 09 Jun 2021 22:34:58 +0000</pubDate>
      <link>https://dev.to/memattchung/3-project-management-tips-for-the-well-rounded-software-developer-275n</link>
      <guid>https://dev.to/memattchung/3-project-management-tips-for-the-well-rounded-software-developer-275n</guid>
      <description>&lt;p&gt;This is the second in the series of The Well Rounded Developer. See previous post &lt;a href="https://dev.to/memattchung/why-all-developers-should-learn-how-to-perform-basic-network-troubleshooting-2ihj"&gt;"Network Troubleshooting for the Well-Rounded Developer"&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Whether you are a solo developer working directly with your clients, or a software engineer part of a larger team that's delivering a large feature or service, you need to do more than just shipping code. To succeed in your role, you also need good project management skills, regardless of whether there's an officially assigned "project manager". By upping your project management skills, you'll increase the odds of delivering consistently and on time — necessary for earning trust among your peers and stakeholders. In fact, I'd go as far to say that it's critical for your &lt;a href="https://dev.to/lpasqualis/personal-brand-development-for-software-engineers-ac6"&gt;Personal Brand&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  3 Project Management Tips
&lt;/h2&gt;

&lt;p&gt;Just like programming, project management is another skill that requires practice — you'll get better with it overtime. Sometimes you'll grossly underestimate a task, thinking it'll take 3 days ... when it really took 10 days (or more!). Don't sweat it. Project management gets easier the more you do it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Capturing Requirements
&lt;/h3&gt;

&lt;p&gt;This seems obvious and almost goes without saying, but as a developer, you need to be able to extract the mental image of your customer/product manager. Then, distill them into words, often referred to as &lt;a href="https://www.mountaingoatsoftware.com/agile/user-stories" rel="noopener noreferrer"&gt;user stories&lt;/a&gt;: "When I do X, Y happens" or "As a [role] ... I want [goal] ... so that [benefit].&lt;/p&gt;

&lt;p&gt;These conversations will require a lot of back and forth discussion. With each iteration, aim to be as specific as possible. Include numbers, pictures, diagrams. The more detail, the better. And most important, beyond defining your acceptance criteria, spell out your assumptions — loud and clear. Because if any of the assumptions get violated while working on the task, you need to sound the alarm and communicate (see "sending frequent communication updates" below) that the current estimated time has been derailed.&lt;/p&gt;

&lt;h4&gt;
  
  
  Example
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;When we receive a packet with a length exceeding the maximum transmission unit (MTU) of 1514 bytes, the packet gets dropped and the counter "num_dropped_packets_exceeding_mtu" is incremented.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Sending frequent communication updates
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1c39l8q9rgzn31qwycx5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1c39l8q9rgzn31qwycx5.png" alt="Email status update"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Most importantly, keep your stakeholders in the loop. Regardless the task at hand is trending on time, slipping behind, or being delivered ahead of schedule, send an update. That might be in the form of an e-mail, or closing out your task using your project management system.&lt;/p&gt;

&lt;h4&gt;
  
  
  Example of a short status update
&lt;/h4&gt;

&lt;p&gt;More often than not, we developers tend to send updates too infrequently and as a result, our stakeholders are often guessing where the project(s) stand. These updates can be short and simple: "Completed task X. Code has been pushed to feature branch but still needs to be merged into mainline and deployed through pipeline."&lt;/p&gt;

&lt;h2&gt;
  
  
  Breaking tasks into small deliverables
&lt;/h2&gt;

&lt;p&gt;It pays off to break down large chunks of work into small, actionable items.&lt;/p&gt;

&lt;p&gt;The smaller, the better. Ideally, although not always possible to achieve, strive to break down tasks such that they can be completed within a &lt;strong&gt;single day&lt;/strong&gt;. This isn't an absolute requirement but serves as a forcing function to crystalize requirements. Changes are, the larger the estimates, the greater chance of it slipping off schedule.&lt;/p&gt;

&lt;p&gt;Of course, some tasks just require more days, like fleshing out a design document. For ambiguous tasks, create &lt;a href="https://ancaonuta.medium.com/how-spikes-help-to-improve-your-agile-product-delivery-a0f104305911" rel="noopener noreferrer"&gt;spike stories (i.e. research tasks)&lt;/a&gt; — just make sure these discovery tasks are time-bounded to a few days.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Project management is an essential skill that every well-rounded developer must have in their toolbox. This skill combined with your technical depth will help you stand out as a strong developer: not someone who just delivers code, but someone who does it consistently and on time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Let's connect
&lt;/h2&gt;

&lt;p&gt;Let's chat more about being a well-rounded software developer. If you are curious about learning how to move from front-end to back-end development, or from back-end development to low-level systems programming, follow me on Twitter: &lt;a class="mentioned-user" href="https://dev.to/memattchung"&gt;@memattchung&lt;/a&gt;&lt;/p&gt;

</description>
      <category>programming</category>
      <category>productivity</category>
      <category>career</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Why all developers should learn how to perform basic network troubleshooting</title>
      <dc:creator>memattchung</dc:creator>
      <pubDate>Sun, 06 Jun 2021 04:51:03 +0000</pubDate>
      <link>https://dev.to/memattchung/why-all-developers-should-learn-how-to-perform-basic-network-troubleshooting-2ihj</link>
      <guid>https://dev.to/memattchung/why-all-developers-should-learn-how-to-perform-basic-network-troubleshooting-2ihj</guid>
      <description>&lt;p&gt;Regardless of whether you work on the front-end or back-end, I think all developers should gain some proficiency in network troubleshooting. This is especially true if you find yourself gravitating towards lower level systems programming. &lt;/p&gt;

&lt;p&gt;The ability to troubleshoot the network and systems separates good developers from great developers. Great developers understand not just code abstraction, but understand the TCP/IP model:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyjopozu278ydhdb5yhxp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyjopozu278ydhdb5yhxp.png" alt="OSI and TCP/IP model"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Source: &lt;a href="https://www.guru99.com/tcp-ip-model.html" rel="noopener noreferrer"&gt;https://www.guru99.com/tcp-ip-model.html&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Some basic network troubleshooting skills
&lt;/h2&gt;

&lt;p&gt;If you are just getting into networking, here are some basic tools you should add to your toolbelt:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Perform a DNS query (e.g. &lt;code&gt;dig&lt;/code&gt; or &lt;code&gt;nslookup&lt;/code&gt; command)&lt;/li&gt;
&lt;li&gt;Send an ICMP echo request to test end to end IP connectivity (i.e. &lt;code&gt;ping&lt;/code&gt; command)&lt;/li&gt;
&lt;li&gt;Analyze the various network hops (i.e. &lt;code&gt;traceroute X.X.X.X&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Check whether you can establish a TCP socket connection (e.g. &lt;code&gt;telnet X.X.X.X [port]&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Test application layer (i.e. &lt;code&gt;curl https://somedomain&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Perform a packet capture (e.g. &lt;code&gt;tcpdump -i any&lt;/code&gt;) and what bits are sent on the wire&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What IP address is my browser connecting to?
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;% dig dev.to

; &amp;lt;&amp;lt;&amp;gt;&amp;gt; DiG 9.10.6 &amp;lt;&amp;lt;&amp;gt;&amp;gt; dev.to
;; global options: +cmd
;; Got answer:
;; -&amp;gt;&amp;gt;HEADER&amp;lt;&amp;lt;- opcode: QUERY, status: NOERROR, id: 39029
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;dev.to.                IN  A

;; ANSWER SECTION:
dev.to.         268 IN  A   151.101.2.217
dev.to.         268 IN  A   151.101.66.217
dev.to.         268 IN  A   151.101.130.217
dev.to.         268 IN  A   151.101.194.217
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Is the web server listening on the HTTP port?
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;% telnet 151.101.2.217 443
Trying 151.101.2.217...
Connected to 151.101.2.217.
Escape character is '^]'.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each of the above tools helps you isolate connectivity issues. For example, if your client receives an HTTP 5XX error, you can immediately rule out any TCP level issue. That is, you don't need to use &lt;code&gt;telnet&lt;/code&gt; to check whether there's a firewall issue or whether the server is listening in on the right socket: the server already sent an application level response.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Learning more about the network stack helps you quickly pinpoint and isolate problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is it my client-side application?&lt;/li&gt;
&lt;li&gt;Is it a firewall blocking certain ports?&lt;/li&gt;
&lt;li&gt;Is there a transient issue on the network?&lt;/li&gt;
&lt;li&gt;Is the server up and running?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Let's chat more about network engineering and software development
&lt;/h3&gt;

&lt;p&gt;If you are curious about learning how to move from front-end to back-end development, or from back-end development to low level systems programming, hit me up on Twitter: &lt;a href="https://twitter.com/memattchung" rel="noopener noreferrer"&gt;@memattchung&lt;/a&gt;&lt;/p&gt;

</description>
      <category>career</category>
      <category>devops</category>
      <category>programming</category>
      <category>beginners</category>
    </item>
  </channel>
</rss>
