DEV Community: Atharva Unde

Your Cache Is Making Things Worse

Atharva Unde — Sun, 07 Jun 2026 00:00:00 +0000

The Seductive Lie of Caching

Teams love caching. It feels like free performance. Add Redis, cache everything, boom—10x faster. Everyone's happy. Except it's not free. And faster isn't always better.

Bad caching creates more problems than it solves:

User makes a payment, sees it pending for 2 hours (cached stale data)
You deploy code, but old cached values are still served to half your users
Cache hit rate is 40% because TTL is wrong
Redis memory fills up and starts evicting random keys
You spend 3 hours debugging "why is this endpoint returning the wrong data?" (it's the cache)

The real cost of caching isn't the cache hit. It's the cache miss you didn't plan for.

The Framework: When to Cache

Ask one question before caching anything:

"If this data is wrong for 30 seconds, does it break something?"

If yes, don't cache. Or cache very short TTL. If no, cache.

Safe to Cache

User preferences (can be wrong for an hour)
Public data (can be wrong for a day)
Computed reports (can be wrong for 5 minutes)
Product catalogs (can be wrong for 10 minutes)

Not Safe to Cache

Payment status (wrong for 1 second = problem)
User balance (wrong for 1 second = problem)
Authorization decisions (wrong for 1 second = problem)
Session state (wrong for 1 second = problem)

The pattern: anything where freshness is critical, don't cache aggressively.

The Real Problem: Cache Invalidation

There are only two hard things in Computer Science: cache invalidation and naming things. Teams underestimate this. You cache something. Great. Now you need to invalidate it.

When do you invalidate?

Option 1: Automatic TTL

Simple, but serves stale data
Good when staleness is acceptable
Bad when freshness is critical

Option 2: Invalidate on writes

Complex, but fresh data
Good when updates are infrequent
Bad when updates are frequent (invalidate more than cache hit)

Option 3: Event-based invalidation

Most complex, but flexible
Good for distributed systems
Bad for tightly coupled systems

The mistake: Choosing invalidation strategy after caching is already deployed. Choose it first.

Common Mistakes

Mistake 1: Caching Database Queries Without Invalidation Strategy

You cache user.find(userId). Great. User updates their email. Oops. Cache still has old email. Now you invalidate: remove cache when user is updated. Great.

Now user updates email, then profile picture. Two cache invalidations? Or one? Distributed system? Now you invalidate on 3 services. One service misses the invalidation message.

This is why cache invalidation is hard.

Mistake 2: Using Redis as a Database

Redis is fast. So teams use it for persistence. Then the server crashes. Redis data is gone. Or: Redis fills up. System deletes random keys. Or: Redis replicates wrong. Data is inconsistent across nodes.

Use Redis as a cache (data loss acceptable) or use a real database (data loss not acceptable). Don't use it as both.

Mistake 3: Caching Expensive Computations Without Measuring

You have an expensive database query. Takes 500ms. Cache it. Cache hit drops it to 1ms from Redis.

Except: the network call to Redis takes 5ms. Cache miss takes 510ms (cache miss + computation). Cache hit rate is 60%.

Average latency: (0.6 × 5ms) + (0.4 × 510ms) = 207ms.

Without cache: 500ms.

With cache: 207ms.

Better? Yes. But you never measured. You just assumed caching helps.

Measure actual latency impact, not just cache hit rate.

Mistake 4: Not Setting Max Memory Policy

Redis memory fills up. What happens? By default: Redis stops accepting writes. System breaks. You configured it: evict least-recently-used keys. Now old data disappears unexpectedly. You configured it: evict random keys. Even worse.

Know what your max memory policy is. Don't let it be a surprise.

Mistake 5: Assuming Cache Hits Always Improve Latency

Network latency to Redis: 5ms

Network latency to database: 50ms

Database query time: 400ms

Cache hit latency: 5ms

Cache miss latency: 455ms

No cache latency: 450ms

Cache hit saves 445ms. Great. But cache miss is slower than no cache. So your average depends on hit rate.

If hit rate drops below 50%, caching is overhead.

Measure actual latency, not hit rate.

The Framework That Works

Only cache what's safe to be stale
- Freshness requirement determines TTL
- Payment? 0 cache or 10 second TTL max
- User preference? 1 hour cache is fine
Have invalidation strategy before deployment
- TTL? Event-based? On-write invalidation?
- Don't say "we'll figure it out"
- Invalidation complexity should influence your cache decision
Measure actual impact
- Don't trust hit rate
- Measure latency with and without cache
- Measure memory cost
- If benefit is small, remove cache
Know when caching makes things worse
- Distributed system with eventual consistency? Caching amplifies the problem
- High memory cost for low benefit? Remove it
- Debugging takes 10x longer? Not worth it
Monitor cache behavior in production
- Hit rate vs latency
- Memory usage
- Eviction rate
- If behavior changes, investigate

When Not to Cache

Payment systems (use real-time data)
Authorization (use real-time data)
Distributed systems with complex invalidation (keep it simple)
Data that changes frequently but you cache anyway (you'll serve lies)
Just to mask slow databases (fix the database instead)

The Real Insight

Caching is optimizing for the wrong thing. You cache a database query to make it faster. But why is the query slow?

Bad indexes? Fix it
N+1 queries? Fix it
Missing pagination? Fix it
Bad query logic? Fix it

Most teams add caching to mask poor database design. Fix the root cause. Use caching for genuinely expensive operations that are unavoidable.

TL;DR

Only cache what's safe to be stale. If data needs to be fresh, don't cache aggressively. Have invalidation strategy first. Choose TTL, on-write, or event-based before you cache. Measure actual impact. Hit rate is meaningless. Measure latency and memory cost. Don't cache to mask slow databases. Fix the database. Monitor in production. If caching behavior changes, investigate.

Bad caching creates more problems than it solves. Good caching is invisible because the trade-offs are understood and managed.

Tags: caching · Redis · performance · backend · architecture · database optimization · distributed systems · DevOps

The Vendor Lock-in Dilemma: Speed vs. Flexibility in Cloud Architecture

Atharva Unde — Tue, 02 Jun 2026 00:00:00 +0000

A Question That Won't Go Away

I get asked this question a lot: "When you're building a side project or working on something, how do you decide what to use?"

The conversation usually goes like this:

People are debating database engines (MongoDB vs. PostgreSQL), event-driven architecture (do we need message queues?), caching strategies (Redis? ElastiCache? Self-hosted?).

But beneath every specific question lurks something bigger:

Do we move fast with vendor-locked proprietary solutions, or invest in open-source alternatives and keep our options open?

I rarely see teams or builders ask this thoroughly enough. Most just drift toward vendor lock-in without realizing it.

Why Vendor Services Are So Seductive

Let's be honest: vendor-specific managed services are seductive.

Take MongoDB. It's open-source, portable, you can self-host it anywhere. But AWS offers DocumentDB. Azure offers CosmosDB. Google offers Firestore. They all claim to be drop-in replacements and promise:

Faster time to market
Built-in high availability
Managed backups and patching
Integrated monitoring and security
Attractive per-unit pricing (at first)

The same pattern repeats everywhere. SQS looks unbeatable for messaging-managed, reliable, integrated with your AWS ecosystem. Why would you self-host RabbitMQ or Kafka when AWS handles all operational burden?

On paper, it's a no-brainer. In reality, it's where most teams paint themselves into corners.

The One Question You Should Ask Before Deciding

Whenever I face this decision, I ask one thing:

"What's our long-term commitment to our cloud provider?"

This question sounds straightforward. It's not.

Behind it sits a minefield of unknowns:

Are we locked into a long-term contract? Is there a volume discount that makes switching cost-prohibitive?
What's our flexibility if we need to migrate? If geopolitical events, legal constraints, or a data breach force us to leave AWS tomorrow, how painful is that?
How deeply is our codebase coupled to vendor APIs? If we use DynamoDB instead of MongoDB, how much code only works with AWS?
What's the real cost of exit? Data transfer fees are just the start. The rewrite is the killer.

I've watched organizations face these questions during a crisis, not before. By then, it's catastrophically expensive.

Most teams never ask at all. They just drift toward lock-in.

Your Three Real Options

You have three paths. Each has honest trade-offs.

Option 1: Pure Vendor Lock-in (DynamoDB, SQS, DocumentDB)

Pros:

Fastest implementation
Zero operational overhead
Deep AWS integration
Native features you won't find elsewhere

Cons:

Zero flexibility once committed
Pricing can change unilaterally (and usually does)
Migration becomes a career-defining project
You're betting the company on one vendor's roadmap

Option 2: Self-Hosted Everything (Kafka, RabbitMQ, PostgreSQL on EC2)

Pros:

Complete vendor independence
Full control over every variable
True multi-cloud portability
No surprise price hikes

Cons:

You become the database team
Operational complexity scales fast
On-call burden for infrastructure failures
Hidden costs in personnel and time

Option 3: Managed Open-Source (AWS-managed RabbitMQ, Azure Database for PostgreSQL)

Pros:

Operational relief without vendor lock-in
Uses community standards, not proprietary APIs
Easier migration if you need to switch providers
High availability without managing it yourself

Cons:

Higher per-unit cost than self-hosted
Still some vendor dependency (but less severe)
Fewer cutting-edge features than pure vendor offerings
Still requires some operational knowledge

How I Actually Decide

Here's my decision framework. Use it when the team's split on this:

Ask in order:

Is there any realistic scenario where we might change cloud providers? (Hint: The answer is usually yes.)
If that happened, how much pain could we actually absorb?
What's the probability × impact of that scenario?
How much extra would Option 3 (managed open-source) cost monthly?

If managed open-source costs 15% more per month but saves you a multi-quarter migration, it's almost always worth it.

Run the math. Compare the monthly overhead to the cost of being stuck.

When Lock-in Actually Breaks

I've watched teams chase every new AWS feature without questioning the cost. A feature ships, everyone wants it, it gets integrated everywhere. Six months later, we're realizing we've painted ourselves into a corner.

But more critically, I've seen real crises:

Geopolitical constraints forcing data residency changes
Legal actions requiring migration off a specific provider
Competitive pricing suddenly making a different provider's economics irresistible
Contract negotiations where vendor lock-in becomes a liability

The teams that had chosen portable solutions? A few weeks of work.

The teams that had optimized purely for speed? Months of crisis mode, expensive rewrites, and careers defined by technical debt blowing up.

What Actually Matters

The question isn't "should we use vendor lock-in?" It's:

"How much flexibility do we need to sleep at night?"

Different organizations answer differently. A startup with 6 months of runway might reasonably choose speed over portability. An enterprise that can't afford surprises invests in optionality.

What most teams do-defaulting to lock-in without ever consciously asking the question-that's the trap.

TL;DR

The next time you're in an architecture discussion and someone says "Let's use DocumentDB because it's managed and fast," ask the question:

"What's our long-term commitment to AWS?"

Listen to the answer. Really listen.

Sometimes the slowest path to launch is the fastest path to long-term survival. And sometimes accepting operational complexity today prevents existential crises tomorrow.

The trick is knowing which is which-and asking before you're in crisis mode.

Tags: vendor lock-in · cloud architecture · AWS · managed services · DynamoDB · multi-cloud · infrastructure decisions · scaling · RabbitMQ · database selection · DevOps strategy

Stop Hardcoding AWS Keys.

Atharva Unde — Tue, 26 May 2026 00:00:00 +0000

The Problem With Hardcoded Keys

When you hardcode AWS access keys and secret keys into your Node.js application, you create a management burden:

Manual Key Rotation: You have to remember to rotate them. Nobody does this consistently.
Secret Storage: You need to store them somewhere (env file, secrets manager, config). Each adds complexity.
Audit Trail: If a key leaks, you need to know who has it and where it's being used.
Risk of Exposure: Every time you export, backup, or move code, you risk exposing credentials.
Deployment Friction: You have to inject secrets into every environment (dev, staging, prod).

If any of those keys get exposed (committed to git, logged, or leaked), you have to revoke them immediately and update everywhere they're used.

That's operational debt.

The Solution: Instance Profiles (for EC2)

An instance profile is a container for an IAM role. When you attach an instance profile to an EC2 instance, applications running on that instance can retrieve temporary credentials from the instance metadata service instead of using static keys.

Your Node.js application doesn't need hardcoded keys. The AWS SDK automatically retrieves temporary credentials through the instance metadata service.

How It Works

1. Create an IAM Role (e.g., "NodeAppRole")
   ↓
2. Attach IAM policies to it (e.g., S3ReadOnly, DynamoDBAccess)
   ↓
3. Create an Instance Profile containing the IAM Role
   ↓
4. Launch your EC2 instance with that Instance Profile
   ↓
5. Applications retrieve temporary credentials from instance metadata
   ↓
6. Node.js SDK automatically picks them up from the credential chain
   ↓
7. Your code makes AWS API calls. Done.

No keys in your code. No env files. No secrets management for credentials.

How Node.js Automatically Picks Up Credentials

The AWS SDK for JavaScript v3 uses a default credential provider chain. It automatically checks multiple sources for credentials, including:

Environment variables
Shared credentials file (~/.aws/credentials)
IAM Identity Center
Web identity tokens
ECS task role credentials
EC2 instance metadata service (if running on EC2)
And others

When you attach an instance profile to an EC2 instance, the SDK automatically discovers and uses the credentials from the instance metadata service.

Your code doesn't need configuration or changes:

// This works automatically if running on an EC2 instance with an Instance Profile
const { S3Client, ListBucketsCommand } = require("@aws-sdk/client-s3");

const client = new S3Client({ region: "us-east-1" });
const command = new ListBucketsCommand({});

client.send(command).then(console.log).catch(console.error);

The SDK fetches credentials from the metadata service behind the scenes. You don't do anything.

Instance Profiles vs Hardcoded Keys: Side by Side

Hardcoded Keys Approach

// DON'T DO THIS
const { S3Client, ListBucketsCommand } = require("@aws-sdk/client-s3");

const client = new S3Client({
  region: "us-east-1",
  credentials: {
    accessKeyId: process.env.AWS_ACCESS_KEY_ID,
    secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,
  },
});

const command = new ListBucketsCommand({});
client.send(command);

Problems:

Keys in environment variables (must be injected at runtime)
Keys in .env file (risk of accidental commit)
Rotating keys means redeploying application
Manual tracking of which environments have which keys
If key is compromised, chaos ensues

Instance Profile Approach

// DO THIS
const { S3Client, ListBucketsCommand } = require("@aws-sdk/client-s3");

const client = new S3Client({ region: "us-east-1" });

// No credentials configuration needed. SDK auto-discovers from instance profile.
const command = new ListBucketsCommand({});
client.send(command);

Benefits:

No keys in code, env vars, or config files
Temporary credentials (AWS manages them automatically for the role)
Fine-grained permissions (one role per application)
Audit trail (CloudTrail tracks which role did what)
Zero application changes for credential management

For EKS (Kubernetes)

For EKS workloads, use IAM Roles for Service Accounts (IRSA) instead of instance profiles.

IRSA works by associating an IAM role with a Kubernetes service account. Pods that use that service account can assume the role and obtain temporary credentials.

# Create a ServiceAccount with an IAM role associated via annotation
apiVersion: v1
kind: ServiceAccount
metadata:
  name: app-service-account
  namespace: default
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT:role/AppRole

Your Node.js pod automatically gets credentials through the AWS SDK's credential chain.

Important: IRSA provides better isolation than relying on the EC2 instance metadata service directly. However, if the instance metadata service is not restricted, pods may still be able to access the node's IAM role through IMDS. AWS recommends using IMDSv2 and restricting metadata service access to prevent unintended access.

Why This Matters

Operational Burden:

Hardcoded keys = you manage rotation manually
Instance Profiles = AWS manages temporary credentials automatically for the role on the instance Security:
Hardcoded keys = static credentials, high blast radius if leaked
Instance Profiles = temporary credentials, automatically rotated, limited scope Auditability:
Hardcoded keys = hard to track (which deployment has which key?)
Instance Profiles = CloudTrail shows exactly which role did what and when Scalability:
Hardcoded keys = doesn't scale beyond a handful of environments
Instance Profiles = scales to hundreds of services and environments

Setup Example (Quick)

Step 1: Create IAM Role

Role Name: NodeAppRole
Trusted Entity: EC2 service (ec2.amazonaws.com)
Permissions: Attach policies your app needs (S3, DynamoDB, etc.)

Step 2: Create Instance Profile

Instance Profile Name: NodeAppProfile
Attach Role: NodeAppRole

Step 3: Launch EC2 Instance

When launching, specify the Instance Profile. Your app automatically has credentials.

Step 4: Code (No Changes Needed)

Your Node.js code works as-is. The SDK finds credentials automatically.

Common Mistakes

Mistake #1: Still setting AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.

Why it's wrong: Defeats the purpose. You're back to managing keys manually.
What to do: Remove them. Let the SDK auto-discover from Instance Metadata. Mistake #2: Using Instance Profile on dev machine but hardcoded keys in CI/CD.
Why it's wrong: Inconsistency. Some environments have keys, some don't.
What to do: Use Instance Profiles everywhere. For local dev, use AWS credentials file or assume a role. Mistake #3: One broad Instance Profile for all applications.
Why it's wrong: Violates least-privilege principle. If one app is compromised, all can access the same resources.
What to do: Create granular roles. One role per application, with only the permissions it needs.

TL;DR

Problem: Hardcoded AWS keys require manual rotation, create security risks, and add operational burden.

Solution: Attach an IAM Role to your EC2 instance (or use IRSA on EKS). Your Node.js app automatically gets temporary credentials.

Result: No keys in code. No env vars. No management. AWS handles credential rotation for you.

If you're still hardcoding keys, stop. Instance Profiles solve this problem completely.

Tags: AWS · Security · Node.js · IAM · Backend · Infrastructure · BestPractices · Credentials

I Panicked Over Nothing (And It Took Me An Hour To Realize)

Atharva Unde — Wed, 20 May 2026 00:00:00 +0000

The Setup (Where I Thought I Was Smart)

Four years ago, I was fresh at my job. Still figuring out Kubernetes. Still learning DevOps. But I was eager to prove I knew what I was doing.

So I volunteered to set up the lower environment cluster on AWS. You know, the cluster developers use for testing before pushing to production.

I built it from scratch:

# Infrastructure Stack for EKS Migration:
- EKS cluster with eksctl YAML
- Private subnets for worker nodes
- NAT gateway for egress
- Microservices deployed (being tested by developers)
- IPv4-only VPC
- NGINX Ingress Controller
- AWS ACM for SSL/TLS
- Network Load Balancer
- CloudFront distribution in front
- Route 53 A record pointing to CloudFront

And it worked. Everything worked perfectly. I was feeling like a real DevOps engineer. Then the messages started.

The Incident

We were migrating from EC2 to Kubernetes. I'd set up the EKS cluster and asked developers to test the microservices while I finished deploying the rest.

They reported the domain wouldn't resolve. On certain networks, browsers showed ERR_NAME_NOT_RESOLVED or DNS_PROBE_FINISHED_NXDOMAIN. But it worked fine from my machine and office network.

Some networks: broken. Others: fine.

I was sweating.

The Panic Phase

Everything checked out:

Application logs → Fine
Kubernetes events → Normal
Cluster health → Good
Load balancer → Healthy
Route 53 DNS records → All there
CloudFront settings → Correct

But developers were still getting resolution failures. Works from some networks (IPv4), fails from others (IPv6-only).

The pattern was screaming at me. I just wasn't listening.

The Rabbit Hole

I debugged the symptom, not the problem.

First 15 minutes: checked DNS. Records were there. CNAME was correct. CloudFront alias configured. But I kept debugging because that's where the error came from.

Next 20 minutes: tried different DNS providers. Changed to Google Public DNS. Flushed cache. Still broken from some networks, fine from others.

Next 15 minutes: blamed CloudFront. Checked everything. Tried cache invalidation. Tried recreating the distribution.

Then I started spiraling. Route 53 routing policy? Load balancer misconfigured? Rebuild the cluster?

One hour in. No progress. Just deeper into the wrong layer.

The Moment I Stopped (And Actually Thought)

Then it hit me: Why does it work from network A but not network B?

That's not a DNS question. That's not a CloudFront question. That's a network connectivity question.

I'd been debugging the wrong layer the whole time.

So I did what I should have done 45 minutes earlier. I checked my network:

# From the problematic network:
$ curl https://ipv4.icanhazip.com
# (no response, timeout)

$ curl https://ipv6.icanhazip.com
# 2409:XXXX:XXXX::1 ✓

That's when it hit me. I only had IPv6 from that network. No IPv4. But my entire cluster was IPv4-only.

The Realization

I had only created an A record (IPv4) in Route 53. Developers on IPv6-only networks had no way to resolve the domain.

Simple:

IPv6-only developer
    ↓
Looks up mydomain.com
    ↓
Route 53 returns only A record
    ↓
"I don't have IPv4, can't use this"
    ↓
ERR_NAME_NOT_RESOLVED

The answer was staring at me. I was debugging the wrong layer.

The Fix

Add an AAAA record (IPv6) to Route 53 pointing to CloudFront.

mydomain.com
  A record → CloudFront
  AAAA record → CloudFront

Now IPv6-only clients resolve the domain and CloudFront handles the dual-stack translation. Took 5 minutes to fix. Took 55 minutes to find.

The Lesson

When something breaks unevenly, don't debug the error. Debug the difference.

I focused on the symptom (DNS resolution failed). I should have asked: why does it work from network A but not network B?

The difference tells you the problem:

Works: Networks with IPv4
Fails: Networks with IPv6 only
Problem: Only IPv4 DNS records

One question. That's all it took.

Most engineers debug vertically-deeper into the same layer. Good engineers debug horizontally-they find what's different between working and broken.

What You Should Know

When something breaks:

Don't panic. Seriously. Panic makes you stupid.
Check the obvious (logs, configs, health).
If that's fine, zoom out. What's different between working and broken?
Debug from the affected user's perspective, not your machine.

Wrong vs Right:

WRONG: "The error says DNS_PROBE_FINISHED_NXDOMAIN, so I'll debug DNS"
RIGHT: "Works from IPv4 networks, fails from IPv6 networks. Why?"

The Timeline

14:30 - Error report
14:35 - Check logs (fine)
14:40 - Check Kubernetes (fine)
14:50 - Blame DNS (waste 15 min)
15:05 - Try different DNS provider (waste 15 min)
15:20 - Blame CloudFront (waste 15 min)
15:35 - Finally ask: "Why does it work here but not there?"
15:40 - Realize: only A record, no AAAA record
15:45 - Add AAAA record to Route 53
15:50 - Test: Works

Total: 1 hour 20 minutes
Time wasted debugging wrong layer: 50 minutes
Time to actually fix: 5 minutes

The Real Lesson

You'll have incidents. You'll panic. That's normal.

The engineers who get ahead stop in the middle of the panic and ask: "Am I looking at the problem or the symptom?"

Symptom: DNS resolution failed

Problem: Missing AAAA record (IPv6 DNS)

I spent an hour on the symptom. Five minutes on the problem would have fixed it.

TL;DR

Problem: EKS cluster worked from IPv4 networks, failed from IPv6-only networks.

My Mistake: Debugged DNS, CloudFront, Kubernetes-the wrong layers.

Root Cause: Only created A record (IPv4). No AAAA record (IPv6).

Fix: Add AAAA record to Route 53.

Pattern: When something breaks unevenly, debug the difference-not the error.

Tags: DevOps · Kubernetes · AWS · EKS · Debugging · IPv6 · IPv4 · CloudFront · Network · Lessons Learned · Infrastructure

Communication Is a Skill Engineers Can't Afford to Ignore

Atharva Unde — Sun, 17 May 2026 00:00:00 +0000

The Scenario

Imagine you work at a company that builds online courses. You've got an in-house learning management system (LMS) hosting proprietary course content. Videos live in S3 buckets and various video providers, protected by a DRM service managed by a third-party vendor.

One day, someone walks into your office: "Before you joined, our team wasn't well-coordinated. It's possible the DRM provider also bundles video storage we don't use. Can you check if any of our videos ended up there?"

Simple question. Messy answer.

What Actually Happened

You dig in and discover: A founding team member uploaded videos to the vendor's storage as a temporary measure, then left. No documentation. No tracking.

Weeks later, the vendor experiences infrastructure issues. Their problems cascade upstream to your DRM service, which breaks your LMS. The chain of causation is so tangled it takes hours to figure out why everything went down.

Now you need to tell three different people about this. And each one needs a completely different explanation.

For Your Engineering Team: Technical Precision

Your team needs the exact failure chain. This goes in your internal KB.

INCIDENT: LMS Outage – Root Cause Analysis

TIMELINE:
- 14:32 UTC: DRM service returned 503 errors
- 14:45 UTC: LMS course playback failed for all users
- 15:10 UTC: Incident team identified DRM service logs showing upstream failures
- 15:30 UTC: Traced to [Third-Party Vendor] S3 storage degradation

ROOT CAUSE:
Videos stored in multiple undocumented locations:
- Primary: Company S3 bucket (monitored, failover)
- Undocumented: Vendor managed storage (no monitoring, no alerting)

IMPACT:
- 6-hour outage affecting 2,400 active learners
- ~$8K revenue loss
- 47 support escalations

RESOLUTION:
1. Consolidated all videos to primary S3 bucket
2. Deprovisioned vendor storage
3. Implemented CloudFront caching layer
4. Added automated inventory checks to CI/CD

PREVENTION:
- Infrastructure audit scheduled
- Documentation required for all storage decisions
- Monitoring alerts for upstream DRM health

This is detailed. Technical. It answers the "how" and "why" for people building the system.

For Your Management Team: Business Impact

Your non-technical stakeholders don't care about S3 buckets or DRM protocols. They care about impact and action.

SUBJECT: LMS Outage – Summary & Next Steps

We experienced a 6-hour LMS outage this afternoon affecting ~2,400 students.

WHAT HAPPENED:
Our video storage system had videos in two places instead of one. When the
secondary location had problems, it broke course delivery because of how our
DRM is configured. This secondary location wasn't documented when originally set up.

BUSINESS IMPACT:
- 2,400 students unable to access courses (6 hours)
- ~$8K estimated revenue impact
- 47 customer support tickets

WHAT WE'RE DOING:
- Consolidated storage to a single monitored location
- Added safeguards against configuration drift
- Conducting full infrastructure review

TIMELINE:
All systems operational as of 8:45 PM. No further issues expected.

Short. Factual. Focused on impact and next steps. Zero jargon.

For Your Customers: Accountability

Your customers don't care about your infrastructure. They care that their courses were down. Keep it brief, take responsibility, show you fixed it.

SUBJECT: Service Restored – Course Access

We experienced a service interruption this afternoon (2:30 PM – 8:45 PM UTC)
that prevented course access.

WHAT HAPPENED:
A configuration in our video delivery system became misaligned, affecting
course playback. We've identified and resolved the root cause.

WHAT WE DID:
- Restored full service at 8:45 PM
- Consolidated video storage to prevent similar issues
- Added monitoring to catch problems faster

FOR YOUR ACCOUNT:
You have not been charged for downtime. Your course progress is intact.
Resume immediately.

Thank you for your patience.

Short. Accountable. Solution-focused. No technical noise.

The Real Skill

You're dealing with the same incident. But:

Your team needs technical details to prevent it happening again
Your management needs business impact to make resource decisions
Your customers need reassurance and accountability

Engineers who get promoted aren't always the ones who build the cleverest systems. They're the ones who can explain what they've built - and why it matters - to anyone in the room.

That's not soft skill theater. That's infrastructure.

TL;DR

Same incident, three audiences, three completely different explanations. Master that shift and you'll communicate like an engineer who actually ships things.

Tags: communication · technical writing · stakeholder management · incident communication · career development · team leadership · infrastructure · documentation

The State of Hiring, AI, and 2025

Atharva Unde — Sun, 19 Oct 2025 00:00:00 +0000

I'm a 2020 Computer Engineering graduate, part of the first batch that had to navigate the chaos of the COVID-19 pandemic during graduation. Even back then, I felt that the hiring process, both in India and globally, was outdated.

Now, let me clarify. I don't hold anything against people who swear by DSA (Data Structures and Algorithms), competitive programming, or LeetCode-style assessments. What I'm sharing here is simply my perspective based on personal experiences.

My Journey: Not a DSA Kid

I never got into DSA or competitive coding. Partly due to a lack of mentorship in college, and partly because I was more interested in getting my hands dirty with real technologies and actually building things.

Back in the pre-ChatGPT era, Stack Overflow and Server Fault were our lifelines. We'd spend hours debugging an issue, desperately searching for the one solution that worked. That process, though tedious, taught me how to solve problems.

When I started my career, I was rejected by a few companies in the final coding or tech rounds simply because my DSA wasn't "strong enough." Writing palindrome checkers or reversing strings never really excited me. My thought was - why obsess over that when the internet already has those answers? What matters is knowing how to apply solutions to real problems.

The Problem With the Current Hiring System

Even today, the hiring process for tech roles remains fragmented and, honestly, quite outdated. I've been involved in hiring over the last few months and I've seen both sides of the equation.

When I interview, I rarely ask DSA questions. For a Node.js backend developer role, for instance, I usually ask the candidate to:

Build a simple Express server
Connect it to MongoDB using Mongoose
Implement basic CRUD operations
Handle errors and edge cases and later start with Aggregation Pipelines

They can refer to official docs or npm pages. Still, I've seen candidates struggle to even set up a basic Express server.

Key Insight: We're in 2025 , in the age of ChatGPT, Copilot, and LLMs. Writing code has become easier than ever. What truly matters now isn't whether someone can memorize syntax, but whether they can think through problems and find the right solutions.

The Real Skills We Should Be Hiring For

1. Problem-Solving Ability

Anyone can write code with AI's help. The differentiator is how well a person can break down a problem, analyze trade-offs, and come up with a solution that makes sense.

2. Search Skills

You'd be surprised how many developers can't simply "Google" effectively. I've seen juniors write full-paragraph queries into search boxes and get frustrated with irrelevant results. Searching or "prompting" now is a real skill.

3. Reading Documentation

This is another lost art. I once sat with a junior developer (with 3 years of experience) who couldn't identify the accepted answer on a Stack Overflow page. It wasn't a knowledge problem - it was a reading comprehension problem.

A Modern, AI-Aware Hiring Process

Post a job requirement with a short take-home challenge for front-end, back-end, or DevOps.
Give candidates time to complete it using any tools or AI assistants they want.
In the first interview round, ask them to run their code, make small changes live, and explain their thought process.
Discuss scalability and problem scenarios - "What happens if traffic spikes?", "How would you handle data consistency?", etc.
Review past projects - see how they've thought through architecture, trade-offs, and delivery.

Instead of spending endless hours on multi-round interviews, I'd even suggest hiring candidates on a trial basis for 15 days. Give them a real problem to solve. See how they perform when faced with an actual requirement, like implementing a secure login system.

What Separates Good Developers

Anyone can ask ChatGPT to generate a login page, but a skilled developer knows:

It needs authentication checks

OAuth requires privacy policy and ToS links

Idle sessions should time out

Security headers must be configured

That level of awareness doesn't come from solving string reversals - it comes from building real systems and understanding user needs.

It's Time to Rethink Hiring

We're at a point where the ability to write code isn't the bottleneck anymore. AI has taken care of that. The real test is how people think, learn, and adapt their judgment.

If companies keep hiring like it's 2015, they'll miss out on great talent that can think creatively and work smartly with AI.

Tags: AI hiring · recruitment trends 2025 · tech talent market · DevOps jobs · engineering hiring · AI in recruitment · developer hiring · talent market 2025 · career insights

Docker: Layers, Caching, Multi-Stage Explained

Atharva Unde — Sat, 15 Feb 2025 12:55:45 +0000

Docker's efficiency is one of its biggest draws. But what makes Docker builds so fast? The secret lies in its layer-based architecture and clever caching mechanism. Let's dive in and see how it all works.

Dockerfiles: A Layered Cake

Every line in your Dockerfile is an instruction, and Docker treats each of these instructions as a distinct layer. But what is a layer, exactly?

Think of it like this: a layer is an intermediate snapshot of your container image during the build process. Each instruction in the Dockerfile creates a new layer, building upon the previous one.

For instance, consider this common Node.js Dockerfile:

FROM node:18
WORKDIR /app
COPY package.json .
RUN npm install
COPY . .
CMD ["node", "server.js"]

This simple Dockerfile translates into five distinct layers:

Base Image Layer: FROM node:18 (The foundation upon which everything else is built)
Working Directory Layer: WORKDIR /app (Sets the working directory inside the container)
Dependency Definition Layer: COPY package.json . (Copies the package.json file)
Dependency Installation Layer: RUN npm install (Installs the project dependencies)
Application Code Layer: COPY . . (Copies the entire application code)

Each of these instructions results in a distinct layer that's stored in the image.

Docker's Caching Superpower

Here's where the magic happens: Docker caches each of these layers during the build process. This means that if a layer hasn't changed, Docker can reuse the cached version instead of rebuilding it from scratch. This dramatically speeds up subsequent builds.

Cache Hit: If an instruction and its inputs haven't changed, Docker pulls the existing layer from the cache.
Cache Miss: If an instruction or its inputs have changed, Docker invalidates the cache for that layer and all subsequent layers. This means it needs to rebuild not only the changed layer but also every layer that comes after it.

Cache Invalidation: When Things Go Wrong

The cache invalidation behavior is crucial to understand. Imagine you have a Dockerfile with eight instructions. If instruction #2 changes, Docker invalidates the cache for instruction #2 and all instructions that follow (3 through 8). They will all need to be rebuilt. This can lead to longer build times if not managed correctly.

A Real-World Example (Multi-Stage Build and Labels)

Let's examine a more complex scenario involving a multi-stage Dockerfile, which is a best practice for creating smaller and more secure images:

FROM node:20-alpine AS build-env
WORKDIR /app
COPY package.json yarn.lock ./
ENV NODE_ENV=production
RUN yarn install --frozen-lockfile --production
COPY index.js ./

FROM gcr.io/distroless/nodejs20-debian12
WORKDIR /app
LABEL org.opencontainers.image.authors="authoremail@example.com"
LABEL "com.example.vendor"="Example LLC"
LABEL version="1.0.0"
LABEL description="This image is used to run hello world backend written in Express Framework"
COPY --from=build-env /app /app
CMD ["index.js"]

In this Dockerfile, we have two stages:

build-env Stage: This stage uses a Node.js Alpine image to install dependencies and prepare the application for production.
Final Stage: This stage uses a distroless image (gcr.io/distroless/nodejs20-debian12), which contains only the necessary runtime dependencies.

Here's how caching works in this multi-stage context:

Independent Caches: Each stage has its own separate cache. Changes in one stage don't automatically invalidate the cache of other stages, unless they affect the COPY --from instruction (which we'll discuss below).
build-env Stage Changes: If you modify package.json or yarn.lock in the build-env stage, the RUN yarn install instruction will be invalidated, and all subsequent instructions in that stage will need to be rebuilt.
COPY --from Interaction: The COPY --from=build-env /app /app instruction is crucial. If the contents of /app in the build-env stage change (due to a rebuild triggered by a change in package.json, for example), the COPY instruction will also produce a different result in the final stage, invalidating the final stage's cache from that point onward.
Label Invalidation: The LABEL instructions, while important for adding metadata, do not directly influence the caching mechanism. Changing label values will always cause the layer containing the LABEL instruction to be rebuilt, but it doesn't impact any previous layers.
Code Changes: If you simply modify code in the index.js file, only the COPY index.js ./ instruction within build-env, and the subsequent COPY --from instruction in the final stage will be affected. The dependency installation stage (RUN yarn install) can still be pulled from the cache, speeding up the build significantly.

Docker Caching and Multi-Stage Builds: Scenario Table

This table outlines how different changes to your Dockerfile or application code impact the caching mechanism in a multi-stage build.

Scenario	Changed File/Instruction	Impact on `build-env` Stage Cache	Impact on Final Stage Cache	Rebuilt Layers
Dependency Change:	`package.json` or `yarn.lock`	`RUN yarn install` and subsequent instructions are invalidated.	`COPY --from=build-env /app /app` and subsequent instructions invalidated.	All layers from `RUN yarn install` in `build-env`, and from `COPY --from` in the final stage
Code Change Only:	`index.js`	Only `COPY index.js ./` is invalidated.	`COPY --from=build-env /app /app` and subsequent instructions invalidated.	`COPY index.js ./` in `build-env`, and from `COPY --from` in the final stage
Dockerfile Change (build-env, Before COPY package.json)`:	(e.g., adding a new `ENV` variable before COPY)	All instructions after and including the changed instruction are invalidated.	If the content of /app does not change, the final stage stays cached	All layers from that step to end of `build-env`
Dockerfile Change (build-env, After COPY package.json)	(e.g., adding an RUN after copy)	All instructions after and including the changed instruction are invalidated.	`COPY --from=build-env /app /app` and subsequent instructions invalidated.	All layers from changed instruction till end of `build-env` and onwards.
Label Value Change:	(Change in LABEL instruction in the final stage)	No impact.	Only the layer with the modified `LABEL` is invalidated.	Layer containing the `LABEL` instruction in the final stage
No Changes	N/A	All layers are pulled from cache.	All layers are pulled from cache.	None

Explanation:

Scenario: Describes the type of change made.
Changed File/Instruction: Specifies the file or instruction that was modified.
Impact on build-env Stage Cache: Explains which layers in the build-env stage are invalidated.
Impact on Final Stage Cache: Explains which layers in the final stage are invalidated.
Rebuilt Layers: Lists the layers that will be rebuilt during the Docker build process.

The Takeaway: Order and Multi-Stage Considerations

With multi-stage builds, you need to consider caching within each stage, as well as how changes in one stage affect subsequent stages through COPY --from instructions. Strategic placement of instructions and careful management of dependencies are key to maximizing build performance.

In the next section, we will explore best practices to optimize caching and reduce unnecessary rebuilds. Stay tuned!

Dockerfile Labels: A Comprehensive Guide

Atharva Unde — Sat, 08 Feb 2025 08:02:10 +0000

In our previous post, we explored how to significantly optimize Docker image size. Now, let's dive deeper and enhance our images with valuable metadata using Docker labels.

Remember this Dockerfile from last time?

FROM node:20-alpine AS build-env
WORKDIR /app

COPY package.json yarn.lock ./
ENV NODE_ENV=production
RUN yarn install --frozen-lockfile --production
COPY index.js ./

FROM gcr.io/distroless/nodejs20-debian12
WORKDIR /app

COPY --from=build-env /app /app
CMD ["index.js"]

If we inspect this image and check its labels using the command:

docker image inspect --format='{{json .Config.Labels}}' atharvaunde/dockerexamples:distroless

We'll see... null. So, what's the point of Docker labels, and why should we care?

Why Use Docker Labels?

Docker labels are essentially metadata tags that you can add to your Docker images. Think of them as key-value pairs that provide extra information about the image.

Use Cases for Docker Labels

Attribution: Who built this image? Include the author's name and email.
Organization: Which company or team created this image?
Version Tracking: What version of the application is contained within? This is especially useful if you consistently tag images as latest.
Description: Provide a short description of the image's purpose.
Custom Metadata: Add any other relevant information, such as dependencies, build dates, or license details.

By embedding this information directly into the image, you provide valuable context to users who consume your images.

Adding Labels to Our Dockerfile

Let's enhance our Dockerfile with labels to include the author, company, version, and a description:

FROM node:20-alpine AS build-env
WORKDIR /app
COPY package.json yarn.lock ./
ENV NODE_ENV=production
RUN yarn install --frozen-lockfile --production
COPY index.js ./

FROM gcr.io/distroless/nodejs20-debian12
WORKDIR /app
LABEL org.opencontainers.image.authors="authoremail@example.com"
LABEL "com.example.vendor"="Example LLC"
LABEL version="1.0.0"
LABEL description="This image is used to run hello world backend written in Express Framework"
COPY --from=build-env /app /app
CMD ["index.js"]

Now, when we run the same inspection command:

docker image inspect --format='{{json .Config.Labels}}' atharvaunde/dockerexamples:distroless

We'll see the labels we added:

{
  "com.example.vendor": "Example LLC",
  "description": "This image is used to run hello world backend written in Express Framework",
  "org.opencontainers.image.authors": "authoremail@example.com",
  "version": "1.0.0"
}

Important Placement in Multi-Stage Builds

When using multi-stage Dockerfiles like ours, it's crucial to place the LABEL instructions in the final stage (i.e., the one that creates the actual image you'll be distributing). If we had placed the LABEL instructions before the FROM gcr.io/distroless/nodejs20-debian12 line, they would be lost in the final image. The final image is created in a separate stage with separate context, so it wouldn't inherit labels from any previous stage.

Docker Label Inheritance and Overriding

This table summarizes how labels are handled when a Dockerfile adds a label that either exists or doesn't exist in the base image.

Scenario	Base Image Label	Dockerfile Label	Resulting Image Label
Inheritance: Label Exists in Base Image	Exists	Absent	Retained (from base)
Overriding: Label Exists in Base Image	Exists	Present	Overridden (Dockerfile value)
Addition: Label Not in Base Image	Absent	Present	Retained (Dockerfile value)

In Conclusion:

Docker labels are a simple yet powerful way to add metadata to your Docker images, providing valuable context and improving discoverability. Remember to place your LABEL instructions in the correct stage of a multi-stage Dockerfile and be aware of how label inheritance works. Happy containerizing!

Base Images: The Secret to Smaller Docker Images

Atharva Unde — Sun, 02 Feb 2025 12:30:00 +0000

In our previous post, we walked through creating a basic Dockerfile. However, we noticed a significant issue: the resulting image for our simple "Hello World" app was a hefty 1.62GB (uncompressed)! That’s not ideal for efficient deployment and resource utilization.

Today, we're diving into how to dramatically reduce your Docker image size by carefully choosing the right base image. You might be surprised by how much impact this single decision can have.

The Impact of Your Base Image

When you build a Docker image, you're essentially layering instructions on top of a foundational image – the base image. All your commands, dependencies, and application code get added on to this base. Consequently, the size and contents of your base image have a direct impact on the final size of your container image.

As an example, in our previous attempt, using FROM node:latest resulted in a 1.62GB (uncompressed) image. That's a lot of bloat for a tiny Node.js application!

Just by switching our base image to FROM node:alpine, we saw the size drop to 244MB (uncompressed). That's a huge improvement, but can we do better? Absolutely!

Understanding Different Base Image Types

Let's explore the common types of base images and when to consider using them:

Standard Images: These are the full-fledged OS-based images like Ubuntu, Debian, or others. They come packed with a wide range of libraries and tools. While convenient, these images tend to be large due to all the extra baggage they carry. Unless you're unsure about your application's OS dependencies or are in a real rush, it's best to avoid them for production containers due to their size and resource consumption.
Alpine Images: These images are based on the super lightweight Alpine Linux distribution. They are much smaller than standard images because they contain only the bare minimum packages needed to run your application. They are ideal for most use-cases and are one of the best choices when starting out with Docker optimization. However, be sure to test your application thoroughly when first switching to Alpine images, as they might lack OS-level dependencies that your application unexpectedly relies on.
Slim Images: While the name might suggest otherwise, slim images can sometimes be larger than Alpine images, but still much smaller than standard ones. They often include only the packages and dependencies required to run specific applications, so it's worth exploring these images if their package set fits your use case. They can be based on various distributions like Alpine, CentOS, or Debian.
Distroless Images: These are specially designed for multi-stage Docker builds. They contain only your application and its runtime dependencies, excluding package managers, shells, and other common Linux utilities. This makes them incredibly small and helps improve the security posture of your containers. Distroless images are perfect for production deployments after you understand the entire application stack and dependencies.

As Google, the creator of distroless images, puts it, "Distroless images contain only your application and its runtime dependencies." You can read more about them here.

Choosing the Right Image: A Practical Approach

Deciding which image to use might seem daunting, but a simple approach helps here:

Start with Alpine: Begin with an Alpine-based image like node:alpine for Node.js applications.
Inspect and Verify: Examine the image's layers and included packages on Docker Hub. This can give you insights into the image's composition. For example, you can check the layers of a specific image tag like node:current-alpine3.20 here.
Add Dependencies Manually: If an Alpine image lacks required dependencies, you can install them manually within your Dockerfile. You can even create your own custom base image from scratch if needed.
Advance to Distroless: For production, after thorough testing, and when the full application dependencies are well-known, consider using distroless images for maximum size reduction and security.

Let's Compare Image Sizes

To illustrate the point, let's take a look at the image sizes we saw in our example, using different base images. We'll show both uncompressed and compressed sizes for comparison:

Base Image & Build Setup	Image Tag	Uncompressed Docker Image	Compressed Docker Image
`node:20-alpine` (build) & `gcr.io/distroless/nodejs20-debian12`	`mycontainer:distroless`	191 MB	49.61 MB
`node:slim`	`mycontainer:slim`	364 MB	78.52 MB
`node:alpine`	`mycontainer:alpine`	244 MB	56.68 MB
`node:latest`	`mycontainer:default`	1.62 GB	381.73 MB

Note: The uncompressed size can be checked by using the command docker image ls | grep mycontainer. The compressed size can be seen when the image is pushed to a container registry or by using docker save mycontainer:distroless | gzip -c | wc -c to see the compressed file size in bytes.

Whats the difference in compressed and uncompressed size?

Optimization through Distroless (Multi-stage Build)

Now, let's see how to optimize it even further using a Distroless image. Here's an updated Dockerfile:

FROM node:20-alpine AS build-env
WORKDIR /app

COPY package.json yarn.lock ./
ENV NODE_ENV=production
RUN yarn install --frozen-lockfile --production
COPY index.js ./

FROM gcr.io/distroless/nodejs20-debian12
WORKDIR /app

COPY --from=build-env /app /app
CMD ["index.js"]

In this approach:

We use node:20-alpine as a builder image to install our dependencies, copy the source, and prepare the application for production.
We then copy all necessary artifacts (/app) into a distroless image gcr.io/distroless/nodejs20-debian12 which only contains the absolute runtime requirements.

This method creates a final image with a minimal footprint, as demonstrated in the table.

Conclusion

Choosing the right base image is a critical step in optimizing your Docker images. Start with Alpine images, manually add needed dependencies, and ultimately aim for distroless images in production. By understanding the trade-offs of each image type, you can greatly reduce your image size, improving efficiency, performance, and resource consumption. Stay tuned for more Docker tips and tricks in future blog posts!

The table clearly highlights the dramatic size difference when using different base images and build strategies. The multi-stage distroless approach yields the smallest final image, making it ideal for production deployments.

Understanding the Difference: Compressed vs. Uncompressed Docker Image Sizes

Atharva Unde — Sat, 01 Feb 2025 13:40:36 +0000

When you build a Docker image, you're essentially creating a series of layers, each representing a step or instruction in your Dockerfile. These layers build on top of each other, forming the final image. Let's break down what the compressed and uncompressed sizes mean in this context:

Uncompressed Image Size

What it is: The uncompressed size represents the total size of all the layers in your Docker image as they exist on your local machine. This includes all the intermediate layers created during the build process, as well as any duplicate data that may exist across layers. Think of it as the raw, unoptimized size of your image.
Why it's larger: This size is usually much larger because:
- Intermediate Layers: Docker caches intermediate layers during the build process. These layers are kept for efficiency if you rebuild the image later, but they all contribute to the total uncompressed size.
- Duplicate Data: When you copy files into your image, Docker might create new layers even if some data is repeated from previous layers. For example, if you copy a directory in an early step and copy it again later with minor changes.
- Unoptimized Storage: The uncompressed size doesn't take advantage of compression or any optimized storage techniques.

Compressed Image Size

What it is: The compressed size is the size of your Docker image after it has been processed and optimized by Docker for storage and transfer. This is the size you'll see when the image is pushed to a registry like Docker Hub or when you manually save it using docker save with compression.
How it's smaller: Docker applies several optimizations during this process:
- Gzip Compression: The most significant optimization is that Docker uses gzip to compress the individual layers of your image. This significantly reduces the amount of storage space required.
- Layer Deduplication: Docker identifies and removes duplicate data across layers. If two layers contain the same file, only one copy of the data will be stored, and layers can reference that single source.
- Optimized Storage: Docker takes advantage of optimized storage mechanisms which only takes up space required for differences between layers and ensures it is stored efficiently.

Still Confused?

Imagine you're packing for an outing!

Uncompressed: This is like throwing all your clothes, toiletries, and shoes into a big suitcase without any organization. It takes up a lot of space.
Compressed: This is like carefully rolling up your clothes, using packing cubes to organize items, and removing any unnecessary duplicate items. The whole suitcase is now much smaller.

TL DR;

Local vs. Registry: The uncompressed size is what you see locally while the compressed size is what's stored and transferred to/from a container registry.
Transfer Efficiency: Docker optimizes images during upload to a container registry, and because of compression, downloads will also take less time.
True Size: The compressed size gives you a more accurate idea of the actual space your image will occupy on a registry or when you save it as a compressed archive.
Size Optimization Goal: When trying to optimize your docker image size, the goal is to minimize the compressed size, because that affects download and upload times, and the cost of storage in the registry.

In Summary:

The uncompressed size is a useful metric to observe the effects of your changes during the image building process. However, the compressed size is the true indicator of how large your image is when stored and transferred. It's this compressed size that you should focus on when optimizing your Docker images for better performance and efficiency.

So how to check the size?

You can check the uncompressed size by running docker image ls | grep <your_image_name> command. To check the compressed size you can either push it to DockerHub, or use command docker save <your_image_name> | gzip -c | wc -c to see compressed size in bytes.

Still confused or having doubts?
Write down in comments section!

Excited to share my first blog post: "Practical Docker: Step-by-Step Container Creation and Execution"! In this guide, I walk through the essential steps of building and running a simple Docker container, with a focus on practical commands and concepts.

Atharva Unde — Sat, 01 Feb 2025 11:27:11 +0000

Practical Docker: Step-by-Step Container Creation and Execution

Atharva Unde ・ Feb 1

#devops #docker #dockerforbeginners #javascript

Practical Docker: Step-by-Step Container Creation and Execution

Atharva Unde — Sat, 01 Feb 2025 11:18:30 +0000

Running your software in containers, whether it's a website, an app, or even your database, is becoming increasingly important. Docker is a powerful tool for this, and a crucial part of it is the Dockerfile.

Think of a Dockerfile as a recipe for building a Docker image. It's a simple text file that lists all the steps needed to create a complete, self-contained package (the image) of your application. This includes things like installing necessary software, setting up your environment, and copying over your code. Docker reads this recipe and automatically builds the image, making sure everything is prepared exactly the way you want it.

Why all this effort?
Repeatable results are critical in DevOps. A Dockerfile ensures that every time you build your image, you get the exact same, reliable outcome.
It eliminates the "it works on my machine" problem by providing a consistent environment.

Let's get started with the technical details of creating a Dockerfile. This will give you the foundation for building your own containerized applications.

FROM node:latest
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
EXPOSE 3000
CMD ["node", "index.js"]

This example showcases a fundamental Dockerfile structure. In future articles, we'll explore more techniques for writing Dockerfiles that improve efficiency, security, and maintainability.

Breakdown of the Dockerfile

FROM node:latest
This specifies the base image to use. node:latest pulls the latest official Node.js image from Docker Hub. This image already has Node.js and npm pre-installed.

WORKDIR /app
This sets the working directory inside the container to /app. This is crucial for organization and makes the Dockerfile more readable. Subsequent instructions will operate within this directory.

COPY package*.json ./
This copies the package.json and package-lock.json files from the build context (your local project directory) into the /app directory inside the container.

RUN yarn
This crucial step runs the yarn command inside the container. It installs all the dependencies listed in package.json. This ensures the container has all the necessary packages.

COPY . .
This copies all the remaining files and directories from the build context to the /app directory. This effectively copies the entire application code to the container.

EXPOSE 3000
This declares that the container will listen on port 3000. While it's important for visibility, it does not automatically map this port to your host.

CMD ["node", "index.js"]
This is the command that runs when the container starts. It executes the Node.js file index.js. This is how your application is launched within the container. There can be only be one CMD instruction in a Dockerfile.

Now that we've created the Dockerfile, let's build the actual container image. This involves using the docker build command.

To build your image, open your terminal and navigate to the directory containing your Dockerfile and the application code. Then, run the following command.

docker build -t mycontainer:latest .

This builds a docker image with the name as myContainer and gets tagged as latest. . denotes that Dockerfile is supposed to take the current directory from which the command is being run as the build context for source code and other files.

If the build process is successful, you'll see output showing the image ID. You can verify the image was built by running

docker image ls
REPOSITORY    TAG       IMAGE ID       CREATED         SIZE
mycontainer   latest    e50c98928825   7 seconds ago   1.62GB

Let's worry about the size of the image in the next chapter of the blog, where we will discuss how to write an optimized Dockerfile

Lets run the image and see if the container is responding to our requests on http://localhost:3000

docker run -d -p 3000:3000 --name myapp mycontainer:latest

This command creates a detached container named myapp from the mycontainer image using the latest tag, maps port 3000 on the host to port 3000 inside the container, and starts the application defined in your CMD instruction inside the container.

After running this command, you can access your application by navigating to http://localhost:3000 in your browser on the host machine!

Source Code: GitHub

Do let me know in comments in case you face any issues,