<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Tom Williams</title>
    <description>The latest articles on DEV Community by Tom Williams (@tomwilliamscloud).</description>
    <link>https://dev.to/tomwilliamscloud</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1468915%2Ffd5ae630-0f0d-4155-9ac9-70df4133e2a5.png</url>
      <title>DEV Community: Tom Williams</title>
      <link>https://dev.to/tomwilliamscloud</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tomwilliamscloud"/>
    <language>en</language>
    <item>
      <title>Lessons from Migrating 9TB of File Shares to FSx</title>
      <dc:creator>Tom Williams</dc:creator>
      <pubDate>Sun, 22 Mar 2026 14:00:29 +0000</pubDate>
      <link>https://dev.to/tomwilliamscloud/lessons-from-migrating-9tb-of-file-shares-to-fsx-4230</link>
      <guid>https://dev.to/tomwilliamscloud/lessons-from-migrating-9tb-of-file-shares-to-fsx-4230</guid>
      <description>&lt;p&gt;Migrating a Windows file server sounds straightforward until you're staring at 9TB of data across 14 shares and trying to work out what's actually worth moving.&lt;/p&gt;

&lt;p&gt;This is what I learned doing exactly that — moving a legacy EC2-hosted Windows file server to FSx for Windows File Server, with a detour through S3 Glacier for the data nobody was using.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start with discovery, not migration
&lt;/h2&gt;

&lt;p&gt;The temptation is to spin up FSx, robocopy everything across, and call it done. Resist that. You'll end up paying FSx prices for terabytes of data that hasn't been touched in years.&lt;/p&gt;

&lt;p&gt;I wrote a PowerShell script to scan every share and classify files by age. This immediately surfaced that a significant portion of the data was cold — files that hadn't been written to in over two years.&lt;/p&gt;

&lt;h2&gt;
  
  
  The LastAccessTime trap
&lt;/h2&gt;

&lt;p&gt;Here's the gotcha that cost me a day: the server had &lt;code&gt;DisableLastAccess&lt;/code&gt; set to &lt;code&gt;1&lt;/code&gt;. This is a common Windows performance optimisation, but it means &lt;code&gt;LastAccessTime&lt;/code&gt; is unreliable — it wasn't being updated when files were read.&lt;/p&gt;

&lt;p&gt;That left &lt;code&gt;LastWriteTime&lt;/code&gt; as the only trustworthy timestamp. It's a reasonable proxy (if nobody's modified a file in two years, it's probably cold), but it's not perfect. A file that's read daily but never edited would appear cold.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix&lt;/strong&gt;: I enabled &lt;code&gt;LastAccessTime&lt;/code&gt; tracking early in the project timeline and let it run for a few weeks before the final classification scan. This gave us a more accurate picture before committing to the archival decisions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson&lt;/strong&gt;: Check &lt;code&gt;fsutil behavior query DisableLastAccess&lt;/code&gt; on day one of any file migration project.&lt;/p&gt;

&lt;h2&gt;
  
  
  Archive before you migrate
&lt;/h2&gt;

&lt;p&gt;With the data classified, the approach was:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Archive cold data to S3 Glacier (cheap, still retrievable if needed)&lt;/li&gt;
&lt;li&gt;Migrate only active data to FSx&lt;/li&gt;
&lt;li&gt;Keep the original EC2 instance read-only for a transition period&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This significantly reduced the FSx storage footprint and brought the monthly cost down to something sensible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Things I'd do differently
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Automate the archival pipeline end-to-end&lt;/strong&gt;: I used a semi-manual process with AWS DataSync. Next time I'd script the full workflow including verification and cleanup.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set up monitoring on FSx from day one&lt;/strong&gt;: Storage growth on FSx can surprise you. CloudWatch alarms on free storage space are essential.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Communicate the archive process to users early&lt;/strong&gt;: People get nervous when they hear "we're archiving your files." Setting expectations about retrieval times and the safety net of Glacier avoids unnecessary panic.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Was FSx worth it?
&lt;/h2&gt;

&lt;p&gt;Yes. Automated backups, native AD integration, no more patching a Windows Server instance, and the storage scales without us managing disks. The migration was a few weeks of focused work, but the operational overhead dropped permanently.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>fsx</category>
      <category>migration</category>
      <category>powershell</category>
    </item>
    <item>
      <title>Why Event-Driven Infrastructure Beats Cron Jobs</title>
      <dc:creator>Tom Williams</dc:creator>
      <pubDate>Sun, 22 Mar 2026 12:51:30 +0000</pubDate>
      <link>https://dev.to/tomwilliamscloud/why-event-driven-infrastructure-beats-cron-jobs-1l8d</link>
      <guid>https://dev.to/tomwilliamscloud/why-event-driven-infrastructure-beats-cron-jobs-1l8d</guid>
      <description>&lt;p&gt;If you've spent any time managing infrastructure at scale, you've probably written a cron job that polls for something. Maybe it checks for untagged resources every hour, or scans for missing CloudWatch alarms on a schedule. It works. It's simple. And it's almost always the wrong long-term answer.&lt;/p&gt;

&lt;p&gt;I recently rebuilt one of these systems — a compliance remediation tool that ensures every EC2 instance in our multi-account AWS organisation has CloudWatch CPU alarms — and the shift from scheduled polling to event-driven architecture made a surprising difference.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cron approach
&lt;/h2&gt;

&lt;p&gt;The original setup ran a Lambda on a CloudWatch Events schedule every 30 minutes. It would:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Assume a role into each member account&lt;/li&gt;
&lt;li&gt;List all EC2 instances&lt;/li&gt;
&lt;li&gt;Check for the existence of CloudWatch alarms&lt;/li&gt;
&lt;li&gt;Create any that were missing&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This worked, but had problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Latency&lt;/strong&gt;: A new instance could run for up to 30 minutes without monitoring&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt;: Every run scanned every instance, even if nothing had changed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complexity&lt;/strong&gt;: The Lambda needed to handle pagination across dozens of accounts, manage rate limiting, and deal with partial failures gracefully&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Noise&lt;/strong&gt;: CloudWatch Logs filled up with successful "nothing to do" runs&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The event-driven approach
&lt;/h2&gt;

&lt;p&gt;The replacement uses EventBridge rules deployed to each member account via StackSets. When an EC2 instance launches or has its tags modified, the event is forwarded to a central event bus where a Lambda evaluates and applies alarms.&lt;/p&gt;

&lt;p&gt;The reconciliation Lambda still exists — it runs daily as a safety net — but it catches edge cases rather than doing the heavy lifting.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changed
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Remediation time&lt;/strong&gt;: From up to 30 minutes to under 60 seconds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lambda invocations&lt;/strong&gt;: Dropped significantly — we only run when something actually happens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code complexity&lt;/strong&gt;: The event-driven Lambda handles one instance at a time, not a full cross-account sweep&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Terraform&lt;/strong&gt;: The module became simpler because each component has a single, clear responsibility&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When cron still wins
&lt;/h2&gt;

&lt;p&gt;Event-driven isn't always the answer. Use scheduled runs when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;There's no reliable event source for the change you care about&lt;/li&gt;
&lt;li&gt;You need a full reconciliation sweep (drift detection, for example)&lt;/li&gt;
&lt;li&gt;The event volume would be higher than the polling cost&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But for "react when something changes" — which is what most compliance automation is doing — EventBridge is the better tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting started
&lt;/h2&gt;

&lt;p&gt;If you're currently running a polling Lambda and want to shift:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Identify the AWS API action that triggers the change you care about&lt;/li&gt;
&lt;li&gt;Create an EventBridge rule matching that event pattern&lt;/li&gt;
&lt;li&gt;Keep your existing Lambda as a daily reconciliation fallback&lt;/li&gt;
&lt;li&gt;Deploy the rule to member accounts via StackSets&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The two patterns complement each other. Events handle the real-time path, scheduled runs handle the "trust but verify" path.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>eventbridge</category>
      <category>automation</category>
      <category>terraform</category>
    </item>
  </channel>
</rss>
