<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: 박준희</title>
    <description>The latest articles on DEV Community by 박준희 (@junhee916).</description>
    <link>https://dev.to/junhee916</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3964655%2F447f5509-845c-4cd0-8de8-a2cf635e18bb.jpg</url>
      <title>DEV Community: 박준희</title>
      <link>https://dev.to/junhee916</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/junhee916"/>
    <language>en</language>
    <item>
      <title>GitHub Actions vs. GCE Pull Poller: My Single VM Deployment Battle</title>
      <dc:creator>박준희</dc:creator>
      <pubDate>Wed, 03 Jun 2026 07:13:50 +0000</pubDate>
      <link>https://dev.to/junhee916/github-actions-vs-gce-pull-poller-my-single-vm-deployment-battle-2m54</link>
      <guid>https://dev.to/junhee916/github-actions-vs-gce-pull-poller-my-single-vm-deployment-battle-2m54</guid>
      <description>&lt;p&gt;Running a full AI product, aicoreutility.com, on a single, modest VM is a constant exercise in resource management and engineering pragmatism. Most days, it's a quiet hum of activity. But every so often, a bug or a deployment hiccup reminds me of the fragility of this setup. One such incident, which recurred a few times, was the dreaded empty &lt;code&gt;.next&lt;/code&gt; directory after a deployment.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Symptom: A Ghost in the Machine
&lt;/h2&gt;

&lt;p&gt;The symptom was always the same: after a deploy, the website would become inaccessible, returning a 500 error. Digging into the logs, the culprit was consistently an empty or corrupted &lt;code&gt;.next&lt;/code&gt; directory. This directory is where Next.js outputs the built application, and without it, the server has nothing to serve.&lt;/p&gt;

&lt;p&gt;This wasn't a random occurrence. It seemed to happen most often during automated deployments triggered by pushes to GitHub. My initial thought was that the build process itself was failing, perhaps due to resource constraints on the VM. I'd check the build logs, and sometimes they'd show errors, but other times, they'd appear to complete successfully, only for the &lt;code&gt;.next&lt;/code&gt; directory to be missing or empty upon inspection.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Wrong Turns: Chasing Shadows
&lt;/h2&gt;

&lt;p&gt;My first few attempts to fix this involved tweaking the build process itself. I experimented with different build commands, ensuring all dependencies were correctly installed, and even tried increasing the VM's memory temporarily. I also looked at the deployment scripts, trying to add more robust error checking.&lt;/p&gt;

&lt;p&gt;One of the key parts of my deployment strategy was an atomic swap. After a successful build, the new build artifacts (in a temporary directory like &lt;code&gt;.next.new&lt;/code&gt;) would be swapped with the current production directory (&lt;code&gt;.next&lt;/code&gt;). This ensures that if the new build fails, the old one remains untouched. However, in this scenario, it seemed like the swap was either not happening correctly, or the build itself was producing an empty output before the swap even occurred.&lt;/p&gt;

&lt;p&gt;I also had a watchdog script running, but it was primarily focused on the PM2 process and basic server health checks. It wasn't sophisticated enough to detect an empty &lt;code&gt;.next&lt;/code&gt; directory as a critical failure before it caused a full outage. The PM2 process would still be running, but it would be serving nothing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Root Cause: A Race Condition on a Single VM
&lt;/h2&gt;

&lt;p&gt;The recurring nature of the problem, especially with automated pushes, pointed towards a timing or concurrency issue. The material I reviewed highlighted the core problem: my GitHub Actions workflow was executing the build and swap operations within an external SSH session or job. If this session timed out, or the connection was interrupted for any reason, the atomic swap could be left incomplete. The new build artifacts might end up in &lt;code&gt;.next.new&lt;/code&gt;, but the swap to &lt;code&gt;.next&lt;/code&gt; would fail, leaving the production directory empty or corrupted. The PM2 process, unaware of this underlying file system issue, would continue to run, leading to the 500 errors.&lt;/p&gt;

&lt;p&gt;Essentially, the external nature of the GitHub Actions build and deploy process, combined with the potential for transient network issues or session timeouts on a single VM, created a race condition. The build could succeed, but the critical final step of making it live could fail silently, leaving the application in a broken state.&lt;/p&gt;

&lt;p&gt;This was particularly insidious because the build logs might not always show a clear error. The process might just... stop. And the watchdog, not designed to check the integrity of the &lt;code&gt;.next&lt;/code&gt; directory itself, wouldn't catch it until the next request came in.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fix: Embracing the Simpler Path
&lt;/h2&gt;

&lt;p&gt;The solution, as documented in my build logs, was to simplify the deployment process and eliminate the external dependency that was causing the race condition. I decided to move away from the GitHub Actions-driven build and swap and instead rely solely on a Google Compute Engine (GCE) pull poller.&lt;/p&gt;

&lt;p&gt;Here's how the new system works:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Actions Disabled for Auto-Deploy:&lt;/strong&gt; The &lt;code&gt;deploy.yml&lt;/code&gt; workflow was modified to only allow manual triggers (&lt;code&gt;workflow_dispatch&lt;/code&gt;). Automatic deployments via &lt;code&gt;on.push&lt;/code&gt; were commented out, effectively disabling them for routine use.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GCE Pull Poller:&lt;/strong&gt; A systemd timer (&lt;code&gt;riel-autodeploy.timer&lt;/code&gt;) is set to run every 90 seconds. This timer triggers a script (&lt;code&gt;auto_deploy_poll.sh&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Server-Side Build and Deploy:&lt;/strong&gt; The &lt;code&gt;auto_deploy_poll.sh&lt;/code&gt; script, in turn, calls &lt;code&gt;redeploy.sh auto&lt;/code&gt;. Crucially, this script runs entirely &lt;em&gt;on the server itself&lt;/em&gt;. It checks if a redeploy is necessary (e.g., if frontend, chat, or backend code has changed). If a redeploy is needed, it performs the build, smoke tests, and atomic swap all within the server's environment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flock and Gates:&lt;/strong&gt; The &lt;code&gt;redeploy.sh&lt;/code&gt; script now uses &lt;code&gt;flock&lt;/code&gt; to prevent multiple instances from running concurrently and includes more robust checks. It ensures that the build is complete and the resulting &lt;code&gt;.next&lt;/code&gt; directory is not empty and has a valid &lt;code&gt;BUILD_ID&lt;/code&gt; before proceeding with the swap. It also ignores temporary connection issues during PM2 reloads, only rolling back on genuine failures (4xx, 5xx, chunk 404s).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This approach eliminates the external session dependency. The build and swap happen in a controlled environment on the VM, making it far less susceptible to network interruptions or timeouts that plagued the GitHub Actions method. The watchdog script was also updated to check for the integrity of the &lt;code&gt;.next&lt;/code&gt; directory (non-empty, valid &lt;code&gt;BUILD_ID&lt;/code&gt;, sufficient chunks) rather than just relying on PM2's status.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Lesson: Simplicity is Robustness
&lt;/h2&gt;

&lt;p&gt;This incident reinforced a core principle of solo development on limited infrastructure: &lt;strong&gt;simplicity often breeds robustness&lt;/strong&gt;. While GitHub Actions is a powerful tool, its complexity introduced a failure mode that was difficult to debug on a single VM. By reverting to a simpler, server-side pull poller, I've created a deployment process that is more resilient to the inherent instabilities of a shared environment. The occasional need to manually trigger a deploy via GitHub Actions is a small price to pay for the increased stability and reduced downtime.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;...building aicoreutility.com in the open...&lt;/em&gt; &lt;a href="https://aicoreutility.com" rel="noopener noreferrer"&gt;aicoreutility.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>deployment</category>
      <category>infra</category>
      <category>gce</category>
      <category>githubactions</category>
    </item>
    <item>
      <title>Next.js 14: 'Could not find the module in the React Client Manifest' — the real cause nobody tells you</title>
      <dc:creator>박준희</dc:creator>
      <pubDate>Wed, 03 Jun 2026 07:13:50 +0000</pubDate>
      <link>https://dev.to/junhee916/nextjs-14-could-not-find-the-module-in-the-react-client-manifest-the-real-cause-nobody-tells-2826</link>
      <guid>https://dev.to/junhee916/nextjs-14-could-not-find-the-module-in-the-react-client-manifest-the-real-cause-nobody-tells-2826</guid>
      <description>&lt;p&gt;I run a small AI product on a single cheap VM, deploying it myself. One morning the homepage started throwing 500s — not always, just &lt;em&gt;sometimes&lt;/em&gt;. The admin pages were fine. The CSS was fine. Only some routes died, and only in production.&lt;/p&gt;

&lt;p&gt;The error in the PM2 logs was this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Error: Could not find the module
"/tmp/riel_agent_build/src/app/page.tsx#default"
in the React Client Manifest.
This is probably a bug in the React Server Components bundler.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;"Probably a bug in the bundler." It wasn't. It was me. If you're self-hosting Next.js 14 (App Router / RSC) and seeing this, here's what's actually happening — and it took me far too long to see it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup that caused it
&lt;/h2&gt;

&lt;p&gt;My deploy script did something that &lt;em&gt;looks&lt;/em&gt; perfectly reasonable:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Build the app in a scratch directory: &lt;code&gt;/tmp/riel_agent_build&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Keep the old, running &lt;code&gt;.next&lt;/code&gt; untouched during the build (zero downtime)&lt;/li&gt;
&lt;li&gt;When the build succeeds, swap just the new &lt;code&gt;.next&lt;/code&gt; into the live app directory &lt;code&gt;/home/me/app/riel_agent&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Build somewhere safe, then move only the output. Classic atomic deploy. The problem is that &lt;strong&gt;one of those build artifacts is not relocatable.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The real cause: RSC bakes an absolute path into the client manifest
&lt;/h2&gt;

&lt;p&gt;In the Next.js App Router, React Server Components need a &lt;em&gt;client manifest&lt;/em&gt; — a map that tells the server which client module to hydrate for each &lt;code&gt;"use client"&lt;/code&gt; boundary. In Next.js 14, the keys in that manifest are written using the &lt;strong&gt;absolute path of the directory the build ran in&lt;/strong&gt; (the build CWD).&lt;/p&gt;

&lt;p&gt;So when I built in &lt;code&gt;/tmp/riel_agent_build&lt;/code&gt;, the manifest was full of keys like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/tmp/riel_agent_build/src/app/page.tsx#default
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then I moved &lt;code&gt;.next&lt;/code&gt; to &lt;code&gt;/home/me/app/riel_agent&lt;/code&gt; and started the server from &lt;em&gt;there&lt;/em&gt;. At runtime, Next resolves modules relative to the real CWD — &lt;code&gt;/home/me/app/riel_agent/...&lt;/code&gt; — but the manifest is still pointing at &lt;code&gt;/tmp/riel_agent_build/...&lt;/code&gt;. The two no longer match. For any route that crosses a server→client boundary, the lookup fails:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Could not find the module &lt;code&gt;/tmp/riel_agent_build/...&lt;/code&gt; in the React Client Manifest.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Why "only sometimes"? Because routes with no client component (or that were statically pre-rendered) don't hit the manifest at all. Pure-static pages render fine; the moment a route needs to hydrate a client boundary at request time, it 500s. That's why my admin pages looked healthy while the homepage flickered between working and broken.&lt;/p&gt;

&lt;p&gt;The tell is right there in the error string: it's an &lt;strong&gt;absolute path that is not where your app actually lives.&lt;/strong&gt; If you ever see &lt;code&gt;/home/runner/...&lt;/code&gt; (GitHub Actions) or &lt;code&gt;/tmp/...&lt;/code&gt; in this error, you have the exact same disease. (I had previously hit the &lt;code&gt;/home/runner&lt;/code&gt; version of this and "fixed" it by moving the build to &lt;code&gt;/tmp&lt;/code&gt; — i.e. I moved the bug, not removed it.)&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix: build &lt;em&gt;in place&lt;/em&gt;, into a sibling output dir
&lt;/h2&gt;

&lt;p&gt;The relocation was the whole problem, so the fix is to &lt;strong&gt;never relocate.&lt;/strong&gt; Build with the real app directory as the CWD, and only redirect the &lt;em&gt;output folder&lt;/em&gt;, not the working directory.&lt;/p&gt;

&lt;p&gt;Next.js already supports this. &lt;code&gt;next.config.js&lt;/code&gt; reads the dist dir from an env var:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// next.config.js&lt;/span&gt;
&lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;exports&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;distDir&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NEXT_DIST_DIR&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;.next&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="c1"&gt;// ...&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So the deploy becomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; /home/me/app/riel_agent           &lt;span class="c"&gt;# real CWD — same as runtime&lt;/span&gt;

&lt;span class="c"&gt;# build into a NEW folder, leaving the live .next serving traffic&lt;/span&gt;
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; node_modules/.cache           &lt;span class="c"&gt;# drop any path-polluted cache&lt;/span&gt;
&lt;span class="nv"&gt;NEXT_DIST_DIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;.next.new npx next build

&lt;span class="c"&gt;# sanity-check the output before swapping (see guard below)&lt;/span&gt;

&lt;span class="c"&gt;# atomic swap&lt;/span&gt;
&lt;span class="nb"&gt;mv&lt;/span&gt; .next .next.previous
&lt;span class="nb"&gt;mv&lt;/span&gt; .next.new .next
pm2 reload riel_agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the manifest keys are written as &lt;code&gt;/home/me/app/riel_agent/...&lt;/code&gt; — which is exactly where the server runs from. The paths match, the 500s stop, and I still get a zero-downtime swap because the old &lt;code&gt;.next&lt;/code&gt; keeps serving until the very last &lt;code&gt;mv&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Two details that matter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Clear &lt;code&gt;node_modules/.cache&lt;/code&gt;.&lt;/strong&gt; Webpack/Next caches can carry the old build path forward and reintroduce the mismatch. A poisoned cache will happily rebuild the wrong absolute paths.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The old &lt;code&gt;.next&lt;/code&gt; stays live during the build.&lt;/strong&gt; Because you're building into &lt;code&gt;.next.new&lt;/code&gt;, the running app never loses its &lt;code&gt;.next&lt;/code&gt;. The only moment of change is the &lt;code&gt;mv&lt;/code&gt;, which is atomic on the same filesystem.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The guard rail I added so it can never ship silently again
&lt;/h2&gt;

&lt;p&gt;A deploy that produces a &lt;em&gt;technically successful build&lt;/em&gt; but a &lt;em&gt;broken manifest&lt;/em&gt; is the worst kind — it passes "did the build exit 0?" and still takes the site down. So I added a dumb, deterministic check before the swap: grep the new server output for any path that isn't the real app directory.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# after building into .next.new, before swapping&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-rqE&lt;/span&gt; &lt;span class="s1"&gt;'/tmp/|/home/runner/'&lt;/span&gt; .next.new/server&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"FATAL: foreign build path leaked into manifest — refusing to swap"&lt;/span&gt;
  &lt;span class="nb"&gt;exit &lt;/span&gt;1
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If any &lt;code&gt;/tmp/...&lt;/code&gt; or &lt;code&gt;/home/runner/...&lt;/code&gt; string made it into the server bundle, the deploy refuses to swap and the previous build keeps running. No LLM judgment, no heuristics — just a string match for "this build was made somewhere it shouldn't have been."&lt;/p&gt;

&lt;h2&gt;
  
  
  The lesson
&lt;/h2&gt;

&lt;p&gt;The interesting part isn't the Next.js trivia. It's that &lt;strong&gt;a build artifact had a hidden dependency on its own location&lt;/strong&gt;, and my "safe" deploy strategy quietly violated it. The error blamed the bundler; the real bug was an assumption in my pipeline — "build output is relocatable" — that happened to be false for exactly one file.&lt;/p&gt;

&lt;p&gt;When a green build still breaks production, stop trusting "it compiled" and look for the thing that's &lt;em&gt;environment-specific&lt;/em&gt;: an absolute path, a baked-in env var, a cache. The fix is rarely more code. It's removing the assumption.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm building &lt;a href="https://aicoreutility.com" rel="noopener noreferrer"&gt;aicoreutility.com&lt;/a&gt; in the open — a full AI product run by one person on one small VM. Most of what I write here is the unglamorous infrastructure that broke first. This one cost me a morning of 500s.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>nextjs</category>
      <category>reactservercomponents</category>
      <category>deployment</category>
      <category>selfhosting</category>
    </item>
  </channel>
</rss>
