DEV Community

Cover image for Why Lift-and-Shift Fails Quietly: Architectural Smells That Appear After Migration
Alok Ranjan Daftuar
Alok Ranjan Daftuar

Posted on • Originally published at aloknecessary.github.io

Why Lift-and-Shift Fails Quietly: Architectural Smells That Appear After Migration

Every cloud migration starts with a promise: "We'll get onto cloud first, optimize later." That sentence is where the trouble begins.

Lift-and-shift leaves on-premises assumptions baked into a system operating in a fundamentally different environment. The failure doesn't arrive on day one. It arrives three months later, in a Slack alert at 2am, or in an invoice that made a VP ask uncomfortable questions.


1. Latency Amplification

On a physical LAN, a service call is sub-millisecond. In a cloud VPC, even same-AZ calls incur 1-3ms. A service making 40 synchronous downstream calls goes from ~4ms network overhead to ~160ms — without any code change.

Same call graph. Same code. 8x more latency — purely from network topology.

Fix: consolidate reads with batch APIs, introduce async messaging for non-critical paths, add caching for hot reference data.


2. Chatty Services

The N+1 problem at infrastructure scale. A service making 60 per-entity HTTP calls to render a dashboard is annoying on LAN. In cloud, it's a 300-600ms tax on every page load.

Chatty patterns also exhaust connection pools faster — each call traverses the network and holds an open connection during transit.

Fix: batch endpoints on all internal APIs, DataLoader pattern, connection pool profiling under realistic concurrency.


3. Cost Surprises

The PoC cost $340. The first production month is $8,200. Nobody changed the architecture.

  • Data egress — free on-prem, metered in cloud. Cross-AZ, cross-region, and internet egress all bill.
  • Over-provisioning — on-prem sizing instincts (buy for 3-5 years) don't translate. Cloud charges per idle CPU cycle.
  • Idle infrastructure — dev/staging environments left running 24/7.

4. Stateful Assumptions

In-memory session state works with a single server. The moment you auto-scale, 33% of requests hit instances with no session. Filesystem dependencies break when containers reschedule or pods restart.

Fix: externalize session to Redis. Replace local filesystem writes with object storage at the upload boundary.


5. The Observability Void

On-prem monitoring (Nagios, Zabbix) watches hardware metrics that mean nothing in cloud. What you need to observe is different: cold start times, managed service throttling, connection pool utilization, cost-per-request.

The danger window is immediately after migration when legacy monitoring reports "all green" while user-facing metrics degrade invisibly.


6. The Monolith in Microservice Clothing

Containerized and deployed to Kubernetes with separate deployments per service. On the surface: microservices. Underneath: shared database schemas, synchronous HTTP chains, coordinated deployments. A distributed monolith you think is clean is a production incident waiting to happen.


A Realistic Migration Philosophy

Lift-and-shift is not a failure state. It's a phase. The mistake is treating it as a destination. Every migrated workload should have a documented list of known architectural debts, an owner for each, and a timeline to address them — agreed before the migration.

Moving to cloud does not modernize your architecture. It gives you a new environment in which your existing architectural decisions — good and bad — will be amplified.


Read the Full Article

This is a summary of my deep dive into post-migration architectural smells. The full article covers all six patterns with diagnostics, mitigations, and a pre-migration review checklist:

👉 Why Lift-and-Shift Fails Quietly — Full Article

The full article includes:

  • Latency amplification with SVG architecture diagram (on-prem vs cloud)
  • Chatty services with before/after code examples and connection pool diagnostics
  • Cost surprise breakdown with egress pricing tables
  • Stateful assumptions with session externalization code (Node.js/Redis)
  • Observability void with Prometheus recording rules for post-migration signals
  • Distributed monolith diagnostic patterns
  • Complete pre-migration architecture review checklist

Top comments (0)