Mritunjay Singh

Posted on Feb 21 • Edited on Aug 24

Backend Interview Sheet

1. What is Docker, and Why is it Used?

Docker is an open-source containerization platform that allows developers to package applications and their dependencies into isolated environments called containers. These containers ensure that applications run consistently across different environments.

🔹 Real-Life Example:

Imagine you're developing a MERN stack web app. It works fine on your laptop, but when your teammate runs it, they get "version mismatch" errors.

With Docker, you create a consistent environment across all machines, preventing such issues.

✅ Why Use Docker?

Docker is beneficial when you need:

Portability → Works on any OS without compatibility issues
Consistency → Eliminates "It works on my machine" problems
Lightweight → Uses fewer system resources than virtual machines
Scalability → Quickly scale applications with minimal overhead

2. Main Components of Docker

🛠️ 1. Docker Daemon (dockerd)

The background process that manages Docker containers
Listens for API requests and handles images, networks, and volumes

💻 2. Docker CLI (Command-Line Interface)

A tool to interact with the Docker Daemon
Common commands:

  docker ps        # List running containers  
  docker run       # Start a new container  
  docker stop      # Stop a running container

📦 3. Docker Images

A read-only template containing the application, libraries, and dependencies
Immutable → Once built, images don’t change
Used to create containers

📌 4. Docker Containers

A running instance of a Docker image
Isolated from the host system but can interact if needed (e.g., exposing ports)

🌐 5. Docker Hub

A cloud-based registry where Docker images are stored and shared

🗂️ 6. Docker Volumes

Used for persistent data storage outside of containers

📌 Illustration of Docker Components:

3. How is Docker Different from Virtual Machines?

⚡ Example:

You're testing a React.js + Express.js app. Instead of running a full Ubuntu VM (which consumes high RAM & CPU), you start a lightweight container in seconds:

docker run -d -p 3000:3000 node:16

Unlike a VM, which takes minutes to boot, a container starts instantly.

🆚 Docker vs. Virtual Machines

Feature	Docker (Containers)	Virtual Machines (VMs)
Boot Time	Seconds	Minutes
Size	MBs	GBs
Performance	Near-native speed	Slower due to hypervisor overhead
Isolation	Process-level isolation	Full OS-level isolation
Resource Efficiency	Shares OS kernel, lightweight	Requires full OS, resource-intensive

docker run vs. docker start vs. docker exec

docker run : Start a new container
docker start : Restart a stopped container
docker exec : Run a command inside it

4. Popular and Useful Docker Commands

Here are some of the most commonly used Docker commands:

🔍 Container Management

# List all running containers
docker ps  

# List all containers (including stopped ones)
docker ps -a  

# Start a stopped container
docker start <container_id>  

# Stop a running container
docker stop <container_id>  

# Remove a container
docker rm <container_id>

🏗 Image Management

# List all available images
docker images  

# Pull an image from Docker Hub
docker pull <image_name>  

# Remove an image
docker rmi <image_name>

📦 Build and Run Containers

# Build a Docker image from a Dockerfile
docker build -t <image_name> .  

# Run a container from an image
docker run -d -p 8080:80 <image_name>

📂 Volume Management

# List all Docker volumes
docker volume ls  

# Create a new volume
docker volume create <volume_name>  

# Remove a volume
docker volume rm <volume_name>

Docker Compose: `docker-compose.yml`

What is `docker-compose.yml`?

The docker-compose.yml file is used to define and run multi-container Docker applications. With Docker Compose, you can manage and orchestrate multiple services, including databases, backend APIs, and front-end applications, all in a single file.

It allows you to define services, networks, and volumes, making it easier to deploy and manage applications that require multiple services working together.

Why is `docker-compose.yml` Useful?

Simplifies Multi-Container Management:
Instead of managing each container manually, Docker Compose allows you to define all services (frontend, backend, database, etc.) in one configuration file and launch them with a single command.
Networking and Dependency Management:
Docker Compose automatically creates a network for your containers, allowing them to communicate with each other. Services can be referenced by their service name, which means the backend can talk to the database without needing an IP address.
One Command to Start Everything:
Instead of running individual containers with complex docker run commands, Docker Compose lets you define the services and their dependencies in a YAML file, and run everything with docker-compose up.
Simplified Development Environment:
With Docker Compose, developers can easily replicate the production environment locally, using the same configuration for services like databases, backends, and frontends. It allows seamless integration and testing, as you don't have to manually set up each service.
Environment Variable Management:
You can manage environment variables for each service within the docker-compose.yml file, making it easier to configure your application for different environments (development, testing, production).

Example of `docker-compose.yml` for a Web Application

Let’s walk through an example where we have three services:

Frontend: A React app running on port 3000.
Backend: A Node.js API running on port 5000.
Database: A MongoDB instance to store data.

version: '3.8'

services:
  frontend:
    build: ./frontend
    ports:
      - "3000:3000"
    volumes:
      - ./frontend:/app
    depends_on:
      - backend

  backend:
    build: ./backend
    ports:
      - "5000:5000"
    environment:
      - NODE_ENV=development
    depends_on:
      - database

  database:
    image: mongo
    volumes:
      - mongo-data:/data/db
    ports:
      - "27017:27017"

volumes:
  mongo-data:

Database Migrations

Explain how you would design and manage a database schema using Sequelize, including the process of setting up migrations, handling model relationships, optimizing for performance, and managing database changes in a collaborative team environment.

Database Migration with Sequelize

Purpose

Database migrations allow you to safely update and manage your database schema over time. They help track changes to the schema in a version-controlled manner, making it easy to collaborate in teams.

Setting Up Migrations

Initialize Sequelize with sequelize-cli to generate migration files.
Migration files contain two primary methods:
- up: For applying changes (e.g., create tables, add columns).
- down: For rolling back changes (undoing the applied changes).

Handling Schema Changes

Creating Migrations:
When you need to add, modify, or delete database schema (e.g., tables, columns), you create a new migration file.
Applying Migrations:
Use the command npx sequelize-cli db:migrate to apply migrations to the database.
Rolling Back Migrations:
Use npx sequelize-cli db:migrate:undo to undo the last applied migration.

Model Relationships

Define associations (e.g., one-to-many, many-to-many) within your models using Sequelize methods:
- hasMany, belongsTo, manyToMany, etc.

Collaborative Workflow

Migrations should be version-controlled using Git.
Each team member works with migrations, and when schema changes are required, new migrations are created and applied across all environments (development, staging, production).

Github Action

Reference

YouTube Video

Steps to Deploy on AWS EC2

1. Launch EC2 Instance

2. Add Secret Variables in GitHub

Go to GitHub Repo Settings → Secrets and Variables → Actions → Add Secret

3. Connect to EC2 Instance

Install Docker on AWS EC2

sudo apt-get update
sudo apt-get install docker.io -y
sudo systemctl start docker
sudo chmod 666 /var/run/docker.sock
sudo systemctl enable docker
docker --version
docker ps

4. Create Two Runners on the Same EC2 Instance

In React App → Actions → Runner → New Self-Hosted Runner
Copy the download commands and run them in the EC2 instance terminal
Install it as a service to keep it running in the background

sudo ./svc.sh install
sudo ./svc.sh start

Do the same for the Node.js Runner

5. Create a Dockerfile for Node.js (Backend)

6. Create a GitHub Actions Workflow

Create a .github/workflows/cicd.yml file

7. Push Docker Images to DockerHub

8. Add Inbound/Outbound Rules on EC2 Instance

9. Access the Node.js Application

Use EC2_PUBLIC_IP:PORT to access your application

Deploying React App

Create a Dockerfile for React
Follow the same process as above

What is GitHub Actions, and how does it work?

GitHub Actions is a CI/CD automation tool that allows you to define workflows in YAML to build, test, and deploy applications directly from GitHub repositories.

How do you trigger a GitHub Actions workflow?

Workflows can be triggered by events such as push, pull_request, schedule, workflow_dispatch, and repository_dispatch.

What are the key components of a GitHub Actions workflow?

Key components include:

Workflows (YAML files in .github/workflows/)
Jobs (Independent execution units in a workflow)
Steps (Commands executed in a job)
Actions (Reusable units of functionality)
Runners (Machines that execute jobs)

What is the difference between jobs, steps, and actions?

Jobs: Run in parallel or sequentially within a workflow.
Steps: Individual tasks executed within a job.
Actions: Pre-built reusable components within steps.

How do you use environment variables and secrets in GitHub Actions?

Define environment variables using env:

  env:
    NODE_ENV: production

Store sensitive values in secrets:

  env:
    API_KEY: ${{ secrets.API_KEY }}

What are self-hosted runners, and when should you use them?

Self-hosted runners are custom machines used to execute workflows instead of GitHub's hosted runners. Use them for private repositories, custom hardware, or specific dependencies.

How do you cache dependencies in GitHub Actions?

Use actions/cache@v3 to cache dependencies and speed up builds:

- uses: actions/cache@v3
  with:
    path: ~/.npm
    key: npm-${{ runner.os }}-${{ hashFiles('**/package-lock.json') }}
    restore-keys: npm-${{ runner.os }}

How do you create a reusable workflow in GitHub Actions?

Define a workflow with on: workflow_call and call it from another workflow:

on: workflow_call
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - run: echo "Reusable workflow"

How do you set up a CI/CD pipeline using GitHub Actions?

Define a workflow that includes jobs for building, testing, and deploying:

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - run: echo "Building..."
  test:
    runs-on: ubuntu-latest
    steps:
      - run: echo "Testing..."
  deploy:
    runs-on: ubuntu-latest
    needs: test
    steps:
      - run: echo "Deploying..."

What is the difference between workflow_dispatch, workflow_run, and schedule triggers?

workflow_dispatch: Manual trigger via GitHub UI/API.
workflow_run: Triggered when another workflow finishes.
schedule: Runs workflows at specific times using cron syntax.

How do you debug a failing GitHub Actions workflow?

Check logs in GitHub Actions UI.
Use set -x in bash scripts for verbose output.
Add continue-on-error: true to isolate issues.

How do you run a GitHub Actions workflow locally?

Use act, a tool that simulates GitHub Actions on your local machine:

act

How do you optimize and speed up GitHub Actions workflows?

Use caching (actions/cache@v3).
Run jobs in parallel when possible.
Use matrix builds for different environments.
Limit workflow execution to necessary branches.

How do you manage permissions and security in GitHub Actions?

Use least privilege principle for tokens (GITHUB_TOKEN).
Restrict secrets exposure to trusted workflows.
Use branch protection rules to limit workflow execution.

Websockets & Multi-backend system

Why Do Backends Need to Talk to Each Other?

In a typical client-server architecture, communication happens between the browser (client) and the backend server. However, as applications grow, keeping everything on a single server exposed to the internet becomes inefficient and unscalable.

When designing a multi-backend system, you need to consider:

If there are multiple services, how should they communicate when an event occurs?
Should it be an immediate HTTP call?
Should the event be sent to a queue?
Should the services communicate via WebSockets?
Should you use a Pub-Sub mechanism?

These decisions impact performance, scalability, and reliability.

Multi-Backend Communication - Final Interview Script

Question: "How do you handle communication between multiple backend services?"

Your Answer:

"When designing multi-backend systems, we have four main communication patterns, each serving different use cases.

1. HTTP/REST - Synchronous Communication

This is direct API calls between services. For example, when a user places an order, the User Service calls Order Service, which then calls Payment Service immediately.

Use case: When you need immediate response and strong consistency, like user authentication or payment validation.

Pros: Simple to implement, immediate feedback, strong consistency
Cons: Tight coupling, if one service fails, whole chain breaks

2. Message Queues - Asynchronous 1:1

Here we use message brokers like RabbitMQ or Amazon SQS. Messages are placed in queues and consumers pick them up when ready. It's point-to-point communication - only one consumer gets each message.

Use case: Task distribution, background job processing, load balancing
Example: Multiple payment workers processing payment requests from a queue

Pros: Loose coupling, fault tolerance, load balancing
Cons: Eventual consistency, more complex error handling

3. Pub-Sub - Event Broadcasting 1:N

Publishers send events to topics, and multiple subscribers listen to the same topic. Same message goes to all subscribers.

Use case: Event-driven architecture where multiple services need to react to same event
Example: When order is created, Inventory Service updates stock, Email Service sends confirmation, Analytics tracks metrics - all from same event

Pros: Highly decoupled, easy to add new features, scalable
Cons: Message ordering challenges, duplicate handling needed

4. WebSockets - Real-time Communication

Persistent bidirectional connections for real-time communication.

Use case: Chat applications, live updates, gaming
Pros: Real-time, low latency
Cons: Resource intensive, connection management complexity

Key Difference - Queue vs Pub-Sub:

Both have same components - Publisher/Producer, Broker, and Consumer/Subscriber. The difference is in message delivery:

Queue: 1:1 - Messages compete, only one consumer gets each message
Pub-Sub: 1:N - Same message broadcasted to all subscribers

Real Example - E-commerce System:

I would use a hybrid approach:

User places order - HTTP call for immediate validation
Order processing - Pub-Sub event 'ORDER_CREATED' to notify multiple services
Background tasks - Queue for heavy processing like report generation

Technology Stack:

Apache Kafka - Can work as both queue and pub-sub
RabbitMQ - For reliable message queuing
Redis Pub-Sub - For simple event broadcasting
Amazon SQS/SNS - For managed cloud solutions

Decision Framework:

Choose HTTP when: Need immediate response, strong consistency, simple flows
Choose Queues when: Task distribution, load balancing, background processing
Choose Pub-Sub when: Multiple services need same event, event-driven architecture
Choose WebSockets when: Real-time bidirectional communication needed

Production Considerations:

Error Handling: Circuit breakers, dead letter queues, retry mechanisms
Monitoring: Queue depths, processing times, error rates
Scalability: Horizontal scaling of consumers, proper partitioning

The key is choosing the right pattern for each specific use case rather than using one approach everywhere."

If Asked Follow-up Questions:

"What about data consistency?"

"For strong consistency, use HTTP calls. For eventual consistency, use async patterns with proper error handling and compensation transactions."

"How do you handle failures?"

"Circuit breakers for HTTP, dead letter queues for messages, retry mechanisms with exponential backoff, and proper monitoring."

"Which technology would you choose?"

"Kafka for high throughput and both queue/pub-sub needs, RabbitMQ for complex routing, SQS for simple cloud solutions."

Example: Payment Processing System

Let's consider a payment application. When a transaction occurs:

The database update should happen immediately (synchronous).
The notification (email/SMS) can be pushed to a queue (asynchronous).

Why not handle everything in the primary backend?

If the email service is down, should the user be forced to wait after completing the transaction? No!
Instead, we push the notification event to a queue.
Even if the notification service is down, the queue retains the event and sends notifications once the service is back.
This is why message queues (e.g., RabbitMQ, Kafka, AWS SQS) are better than HTTP for such tasks.

Types of Communication

Synchronous Communication
- The system waits for a response from the other system.
- Examples: HTTP requests, WebSockets (in some cases).
Asynchronous Communication
- The system does not wait for a response.
- Examples: Message queues, Pub-Sub services.

Why WebSockets?

WebSockets provide persistent, full-duplex communication over a single TCP handshake.

Limitations of HTTP:

In HTTP, the server cannot push events to the client on its own.
The client (browser) can request, and the server can respond, but the server cannot initiate communication with the client.

WebSockets vs. HTTP for Real-Time Applications

Example: Stock Market Trading System

Stock buying & selling generates millions of requests per second.
If you use HTTP, every request requires a three-way handshake, adding latency and overhead.
With WebSockets, the handshake happens only once, and then the server and client can continuously exchange data.

Alternative: Polling

If you still want to use HTTP for real-time updates, an alternative approach is polling.

However, polling creates unnecessary load on the server by making frequent requests.
WebSockets are a more efficient solution for real-time updates.

Some Basic Questions

Basic

What is Node.js?

Node.js is a runtime environment for executing JavaScript on the server side. It is not a framework or a language. A runtime is responsible for memory management and converting high-level code into machine code.

Examples:

Java: JVM (Runtime) → Spring (Framework)
Python: CPython (Runtime) → Django (Framework)
JavaScript: Node.js (Runtime) → Express.js (Framework)

With Node.js, JavaScript can run outside the browser as well.

Runtime vs Frameworks

Runtime: Focuses on executing code, handling memory, and managing I/O.
Framework: Provides structured tools and libraries to simplify development.

What happens when you enter a URL in the browser and hit enter?

DNS Lookup

The browser checks if it already knows the IP address for www.example.com.
If not, it contacts a DNS (Domain Name System) server to get the IP address (e.g., 192.168.1.1).

Establishing Connection

The browser initiates a TCP connection with the web server using a process called three-way handshake.
If the website uses HTTPS, a TLS handshake happens to encrypt the communication.

Sending HTTP Request

The browser sends an HTTP request to the server:

GET / HTTP/1.1
Host: www.example.com

Server Processing

The web server processes the request and may:
    Fetch data from a database
    Generate a response (HTML, JSON, etc.)

Receiving the Response

The server sends an HTTP response back to the browser:

HTTP/1.1 200 OK
Content-Type: text/html

Rendering the Page

The browser processes the HTML, CSS, and JavaScript and displays the webpage.

Difference Between Monolithic and Microservices Architecture

Monolithic Architecture

All components (UI, DB, Auth, etc.) are tightly coupled.
Single application handles everything.

Microservices Architecture

Divided into small, independent services.
Each service handles a specific function (Auth, Payments, etc.).

Pros:

Scalable
Services can use different tech stacks

Cons:

More complex to manage
Requires API communication

HTTP Status Codes

200 OK
201 Created
400 Bad Request
401 Unauthorized
402 Payment Required
404 Not Found
405 Method Not Allowed
500 Internal Server Error

What is cors ?

CORS stand for Cross Origin Resource Sharing- a security feature built into browsers
It blocks the requests from one origin(domain,protocol or port) to another origin unless explicitly allowed by the server
For exmple: Your frontend is hosted at frontend.com and you bacend at backend.com
The browser these as a different origin and blocks the request unless it is explicitly allowed
why does this happen though?
CORS error are triggered by Same Origin Policy,which prevents malicious websites from making unauthorized API call using your credentials

Browser isn't blocking the requests---its blocking the response for security reasons

REST vs GraphQL

REST API:

"REST (Representational State Transfer) is an architectural style where data is fetched using multiple endpoints, and each request returns a fixed structure of data."

GraphQL:

"GraphQL is a query language for APIs that allows clients to request only the data they need, reducing overfetching and underfetching."

💡 Key Point:

REST APIs have multiple endpoints (/users, /orders), while GraphQL has a single endpoint (/graphql).
GraphQL provides more flexibility by allowing clients to request exactly what they need in a single query.
REST APIs return predefined responses and sometimes require multiple requests.
If performance and flexibility are key concerns, GraphQL is a better choice.

How Do You Design an API for a Large-Scale System?

Use Microservices: Separate services (Auth, Payments, etc.).
Load Balancers: Distribute traffic efficiently.
Caching: Use Redis for frequently accessed data.
Pagination: Send data in chunks.
Rate Limiting: Prevent API abuse.

What is Pagination? How to Implement It?

Pagination breaks large datasets into smaller parts.
Implementation:

Use limit and offset in database queries.
Example:

  SELECT * FROM users LIMIT 10 OFFSET 20;

Use cursor-based pagination for better performance.

How Do You Handle File Uploads?

Single file upload: Use multipart/form-data with Express.js & Multer.
Large file handling: Use chunked uploads.
Storage options: Store files on AWS S3, Google Cloud Storage, or a database.
Server-side Upload: The file is uploaded to your backend server first, and then the server sends it to S3 or Cloudinary.

JWT - Final Interview Answer Script

Question: "What is JWT? How does it work?"

Your Complete Answer:

"JWT stands for JSON Web Token. It's a stateless authentication mechanism where user information is encoded in a token that can be verified without storing session data on the server.

JWT Structure - 3 Parts:

JWT has three parts separated by dots:
header.payload.signature

1. Header: Contains metadata about the token

{
  "alg": "HS256",    // Algorithm used
  "typ": "JWT"       // Token type
}

2. Payload: Contains user information and claims

{
  "userId": 123,
  "role": "admin",
  "exp": 1640995200    // Expiry timestamp
}

3. Signature: Ensures token integrity and authenticity

Created by encrypting header + payload with a secret key
Used to verify token hasn't been tampered with

How JWT Authentication Works:

Step 1 - User Login:

User sends credentials to server
Server validates credentials
If valid, server creates JWT token

Step 2 - Token Creation:

Server creates header and payload
Server generates signature using secret key: HMAC-SHA256(header.payload, secretKey)
All three parts are combined: header.payload.signature

Step 3 - Token Usage:

Server sends token to client
Client stores token (localStorage or cookie)
Client sends token in Authorization header for API requests

Step 4 - Token Verification:

Server receives token with request
Server splits token into three parts
Server recreates signature using same secret key
If signatures match, token is valid
Server extracts user info from payload

Key Benefits:

Stateless: No need to store session data on server
Scalable: Works across multiple servers
Self-contained: All user info is in the token
Cross-domain: Can work across different domains

Security Considerations:

Secret Key: Never expose the secret key used for signing
Expiry: Always set short expiry times (15-30 minutes)
HTTPS: Always use HTTPS to prevent token interception
Storage: Be careful about XSS if storing in localStorage

Real-world Example:

When user logs into an e-commerce site:

User enters username/password
Server validates and creates JWT with user ID, role, expiry
Client stores JWT and sends it with every API call
Server verifies JWT and processes request
When token expires, user needs to login again or refresh token

JWT vs Sessions:

JWT:

Stateless (no server storage)
Better for APIs and microservices
Self-contained

Sessions:

Stateful (server stores session data)
Better for traditional web apps
More secure (data on server)

The choice depends on your architecture - use JWT for REST APIs and distributed systems, sessions for traditional web applications."

If Asked Follow-up Questions:

"How do you handle token expiry?"

"Use refresh tokens. Short-lived access tokens (15 mins) with longer-lived refresh tokens (7 days). When access token expires, use refresh token to get new access token."

"What if someone steals the JWT?"

"That's why we use short expiry times, HTTPS only, and httpOnly cookies when possible. Also implement token blacklisting for logout."

"Can JWT be modified?"

"If someone modifies the payload, the signature won't match because they don't have the secret key. Server will reject the token."

"Where do you store JWT on client?"

"For web apps: httpOnly cookies for security, or localStorage for convenience but with XSS risk. For mobile: secure storage."

Question: "Explain Cookies, Sessions, Tokens, and Local Storage for authentication."

Your Answer:

"These are four different ways to handle user authentication and data storage. Let me explain each:

1. COOKIES - Automatic Browser Storage

What it is:
Cookies are small pieces of data that the server sends to the browser, and the browser automatically sends them back with every request.

How it works:

Server creates cookie and sends to browser
Browser stores it automatically
Browser includes cookie in every HTTP request to that domain
Server reads cookie data from request

Authentication use:

User logs in → Server creates cookie: authId=abc123 → Browser stores it → 
Every request includes: Cookie: authId=abc123 → Server validates cookie

Example: When you login to Facebook, server sets cookie with session ID. Now every page you visit automatically sends this cookie.

2. SESSIONS - Server-Side Storage

What it is:
Session is user data stored on the server, identified by a session ID that's typically stored in a cookie.

How it works:

User logs in → Server creates session data in memory/database
Server generates unique session ID
Session ID is sent to browser via cookie
Browser sends session ID back with requests
Server looks up session data using this ID

Authentication flow:

Login → Server creates: sessions[abc123] = {userId: 456, role: 'admin'} →
Cookie: sessionId=abc123 → Server uses ID to fetch user data

Example: Traditional web applications where user data is stored on server for security.

3. TOKENS (JWT) - Self-Contained Authentication

What it is:
A token is an encoded string containing user information that can be verified without storing anything on the server.

How JWT works:

Contains 3 parts: Header.Payload.Signature
Payload has user info (userId, role, expiry)
Signature ensures token hasn't been tampered with
Server can verify token without database lookup

Authentication flow:

Login → Server creates JWT token with user info → Client stores token →
Client sends: Authorization: Bearer <token> → Server verifies signature

Example: REST APIs where each request includes JWT token in Authorization header.

4. LOCAL STORAGE - Browser Client Storage

What it is:
Browser's built-in storage that persists data locally, accessible via JavaScript.

How it works:

JavaScript can store/retrieve data: localStorage.setItem('token', 'abc123')
Data persists even after browser closes
Available to JavaScript on same domain
5-10MB storage capacity

Authentication use:

Login → Store token: localStorage.setItem('authToken', token) →
API calls → Get token: localStorage.getItem('authToken') → 
Send manually: headers: {Authorization: Bearer + token}

Example: Single Page Applications (SPAs) where JavaScript manages authentication.

Key Differences Summary:

Storage Location:

Cookies: Browser (managed automatically)
Sessions: Server-side (secure)
Tokens: Client-side (self-contained)
Local Storage: Browser (manual JavaScript)

Security:

Cookies: Can be HttpOnly (XSS safe), but CSRF risk
Sessions: Most secure (data on server)
Tokens: Stateless but vulnerable if stolen
Local Storage: Vulnerable to XSS attacks

Usage:

Cookies: Automatic with every request
Sessions: Server looks up data using session ID
Tokens: Manual inclusion in headers
Local Storage: Manual JavaScript handling

When to Use What:

Use Cookies + Sessions when:

Traditional web applications
Maximum security needed
Server-side rendering
Simple user flows

Use Tokens (JWT) when:

REST APIs
Mobile applications
Microservices architecture
Need stateless authentication

Use Local Storage when:

Single Page Applications (SPAs)
Need persistent client-side data
Want manual control over auth flow
Client-side JavaScript frameworks

Intermediate

What is full text search?

What is Serverless and Serverful backend ?

A serverfull backend means you manage the entire server, while a serverless backend means you don’t have to manage servers—your code runs only when needed on cloud platforms like AWS Lambda
Example: Imagine you are building a food delivery app like Zomato or Uber Eats.

If you use a serverfull backend:
    You set up an Express.js server on AWS EC2.
    The server is always running, handling all API requests like fetching restaurants, placing orders, and tracking deliveries.
    You pay for the server 24/7, even when there are no active users.

If you use a serverless backend:
    You use AWS Lambda functions to handle API requests.
    When a user places an order, the function runs only for that request and then shuts down.
    You only pay for execution time, making it cost-effective.

Can you explain single-threaded vs. multi-threaded processing?

Single-threaded programs execute one task at a time, while multi-threaded programs can execute multiple tasks in parallel. However, single-threaded systems can still be asynchronous using event loops, like in Node.js. If I were building a CPU-intensive app like a video editor, I’d go with multi-threading. But for an API server handling multiple users, I’d use a single-threaded, asynchronous model like Node.js to handle requests efficiently

🧠 Web Server Request Handling – Full Interview Deep Dive

Understand how web servers handle various types of requests, what part of the system gets triggered, and why CPU, disk, and memory are used in different ways.

🔹 Case 1: Static File Request (e.g., `GET /index.html`)

🧱 Architecture:

Client → Web Server (Nginx, Apache) → Disk

Step	Description	CPU Used?	Why
1	TCP Connection Establishment	✅	OS uses CPU threads to handle new socket connection
2	TLS Handshake (if HTTPS)	✅✅	Public-key crypto (RSA/ECC), key exchange – very CPU intensive
3	HTTP Request Parsing	✅	Server reads headers, URL, method
4	Check In-Memory Cache	⚠️ Sometimes	If file is cached, skip disk I/O (saves time and CPU)
5	Disk I/O – Read File	⚠️ + I/O	Slowest part if uncached (mechanical disk = even slower)
6	Build HTTP Response	✅	Add headers, content-type, status, etc.
7	Send Response (TCP Send)	✅	Network stack and syscalls involve CPU

✅ Conclusion:

Mostly I/O bound, but CPU handles parsing & networking
With HTTPS, CPU spikes due to encryption

🔹 Case 2: Dynamic Request (Backend involved)

e.g., `GET /profile?id=10`

🧱 Architecture:

Client → Web Server → Backend Server → DB

Step	Description	CPU Used?	Why
1	TCP + TLS Handshake	✅✅	Same as static case
2	Request Parsing	✅	Headers, query params
3	Reverse Proxy to Backend	✅	Web server forwards via IPC/port
4	Backend App Logic	✅✅	Routing, auth, business logic (CPU heavy)
5	Database Query	⚠️ CPU + I/O	Reads/writes involve disk and DB engine CPU
6	Response Generation (HTML/JSON)	✅✅	Templating or serialization is CPU-bound
7	Send Response → Client	✅	Network transmission

✅ Conclusion:

This is both CPU + I/O bound
More cores help in scaling
Backend does the heavy lifting, web server is just the router

🔹 Case 3: Cached Response

🧱 Architecture:

Client → Web Server → Cache (Redis/Memcached/internal) → Client

Step	Description	CPU Used?	Why
1	TCP + HTTP Parsing	✅	Normal
2	Cache Lookup (Memory)	⚠️	Fast RAM lookup, nearly no disk or backend call
3	Response Ready → Send	✅	Minimal CPU for sending back

✅ Conclusion:

Fastest flow among all
Skips backend & disk I/O → highly efficient
Caching = performance booster

🔹 Case 4: Reverse Proxy (Static + Dynamic Mix)

🧱 Architecture:

Client → Nginx (Reverse Proxy) → Static OR Backend

Step	Description	CPU Used?	Why
1	Request to Nginx	✅	Parses incoming request
2	Nginx Checks Routes	✅	Matches URI patterns
3	Serve Static (if matched)	⚠️	Disk read if not cached
4	Else Proxy to Backend	✅	Same as Case 2
5	Send Response Back	✅	Nginx acts as gateway

✅ Conclusion:

Nginx = Traffic Manager
Smart separation between static and dynamic content
Efficient request routing saves resources

🔹 Case 5: HTTPS (TLS) Request

Step	Description	CPU Used?	Why
1	TCP Connection	✅	Basic connection setup
2	TLS Handshake	✅✅✅	Expensive: Cert validation, RSA/AES/ECC operations
3	HTTP Parsing	✅	After TLS tunnel established

✅ Conclusion:

TLS is CPU-heavy
TLS Offloading to Cloudflare or Load Balancer is often used

🔹 Case 6: API Request (POST JSON)

🧱 Architecture:

Client → Web Server/API Gateway → Backend → DB

Step	Description	CPU Used?	Why
1	Receive POST	✅	TCP + header parsing
2	JSON Body Parsing	✅✅	Deserialization consumes CPU
3	Business Logic	✅✅	Auth, validation, core logic
4	DB Query	⚠️	DB fetch/update
5	Build JSON Response	✅✅	`JSON.stringify()` or equivalent
6	Send Response	✅	Network syscall

✅ Conclusion:

APIs (especially large JSON) are CPU-bound
Parsing/serializing JSON = CPU cycles
Use optimized libraries (like fast-json-stringify, etc.)

🔹 Case 7: File Upload / Download

🧱 Architecture:

Client → Web Server → Disk / Object Store (e.g., S3)

Step	Description	CPU Used?	Why
1	TCP + Parse	✅	Start request
2	Read File Chunks (Upload)	✅ + I/O	Buffered I/O reads
3	Write to Disk/S3	⚠️	Disk or network-based I/O
4	Send Acknowledgement	✅	Final response

✅ Conclusion:

I/O-bound process, CPU handles chunking and buffering
Network & Disk performance matter a lot here

HTTP/2 and HTTP/3 Support in Web Servers

🔹 What is HTTP?

HTTP (HyperText Transfer Protocol) is an application-layer protocol used for communication between clients (like browsers) and web servers.
Versions: HTTP/1.1 → HTTP/2 → HTTP/3

🚀 Why HTTP/2 and HTTP/3?

To improve latency, reduce page load times, and utilize modern internet features like multiplexing, better compression, and faster handshake.

🔸 HTTP/1.1 Limitations (Why Upgrade?)

Head-of-line (HOL) blocking: One slow resource blocks others.
Multiple TCP connections needed → overhead.
No compression of headers.
High latency in handshake and transfer.

✅ HTTP/2 Features

1. Multiplexing

Multiple streams (requests/responses) over a single TCP connection.
No need for multiple TCP connections.

┌────────────┐
│ Browser    │
├────────────┤
│ req1       │──────┐
│ req2       │─────►│
│ req3       │──────┘
│            │
└────────────┘
         ↓
     One TCP connection

2. Binary Framing

All messages (headers, data) are encoded in binary format instead of plain text → faster and more compact.

3. **Header Compression (HPACK)

HTTP headers are compressed to save bandwidth.

4. **Server Push (Optional)

Server can "push" resources (CSS/JS/fonts) before the client even asks.
Useful in predictable page loads.

→ Client: GET /index.html
← Server: /index.html + /style.css + /app.js (pushed without asking)

HTTP/3: What Changed Again?

✅ Uses QUIC protocol instead of TCP

QUIC = Quick UDP Internet Connections (built by Google)
Why QUIC?

TCP has these problems:

Slow connection setup (3-way handshake).

Head-of-Line blocking at the TCP level.

Connection loss resets everything.

🧠 Web Server vs Application Server - Deep Dive

🖥️ 1. What is a Web Server?

🔧 Primary Role:

A web server handles static content such as:

HTML
CSS
JavaScript
Images (JPG, PNG, etc.)

It serves files directly from disk to the client browser.

💡 Think of a Web Server like a waiter — it brings pre-cooked food (static files) to your table.

⚙️ Features of Web Server

Feature	Description
Static File Serving	Serves `.html`, `.css`, `.js`, images directly from file system.
SSL/TLS Termination	Handles HTTPS encryption/decryption (SSL certificates).
Caching	Stores frequently requested files in memory to improve speed.
Load Balancing	Distributes incoming requests across multiple App Servers.

🌐 Popular Web Servers

Apache HTTPD (older but reliable)
Nginx (very fast, efficient)
Caddy (auto HTTPS with Let's Encrypt)

🏭 2. What is an Application Server?

🔧 Primary Role:

An Application Server handles dynamic content. It:

Executes backend code
Fetches data from databases
Performs business logic

💡 Think of an Application Server as a chef — it cooks fresh food (generates dynamic content) based on your order (request).

⚙️ Features of App Server

Feature	Description
Code Execution	Runs backend code (e.g. Express, Django, Spring Boot)
DB Connectivity	Connects to databases like MySQL, MongoDB, PostgreSQL
Session Management	Maintains user session, login state, etc.
Transactions	Ensures atomic DB operations (commit or rollback)

💡 Common Examples

Language	Application Servers
Node.js	Express.js, NestJS
Java	Tomcat, Jetty, WildFly
Python	Django, Flask, FastAPI
PHP	Laravel, Symfony

🔄 3. How They Work Together

Client (Browser / Mobile App)
⬇️
Web Server (Nginx / Apache)
⬇️
Static Route? ➡️ Serve static file directly
⬇️
Dynamic Route? ➡️ Forward to App Server
⬇️
App Server (Express / Django)
⬇️
DB, Business Logic Execution
⬇️
Response sent back via Web Server
⬇️
Client receives result

Why do we separate static and dynamic content handling?

Performance: Static files (e.g., images, JS) can be cached and served quickly by a web server like Nginx.

Scalability: Separating allows static content to be offloaded from the heavier app server.

Security: Keeps the app logic isolated; static servers don’t need access to databases or internal logic.

Simplicity: Web servers are optimized for speed and concurrency, while app servers are optimized for logic and computation.

Can a single server act as both web and application server?

✅ Yes, especially in small-scale setups.

Node.js Express, Django, and Spring Boot can serve both static and dynamic content.

However, in production, it’s a best practice to separate them:

    Nginx (web server) handles routing, SSL, compression.

    App server handles dynamic requests.

⚙️ Technical

How does Nginx improve performance with caching and load balancing?

Caching:

Stores frequent responses (e.g., HTML pages, JSON APIs) in memory.

Reduces load on backend app servers and databases.

Load Balancing:

Distributes incoming traffic across multiple app servers.

Methods: Round Robin, Least Connections, IP Hash.

Ensures high availability and scalability.

Extra features:

Connection pooling

GZIP compression

SSL offloading

What happens when an HTTPS request reaches Nginx?

TLS Handshake:

Nginx decrypts the request using the SSL certificate.

Ensures data confidentiality and authenticity.

Routing:

Nginx uses server_name and location blocks to match the request.

Proxying (if configured):

Passes the decrypted request to a backend app server over HTTP (or internal HTTPS).

Response:

Nginx sends the encrypted response back to the client.

✅ You can also use Nginx as a reverse proxy + SSL terminator.

⛓️ What Is a Presigned URL?

A presigned URL is a special type of temporary, secure link that allows someone to access a specific resource — like a file in cloud storage — without logging in or having permanent credentials.

It gives permission to perform actions like:

🔼 Uploading a file
🔽 Downloading a file
❌ Deleting a file

... for a limited time.

This is especially useful when you:

Want users to upload or download files without giving them full access to your server or cloud.
Need secure sharing without managing login systems or API keys.

🛠️ How It Works (Behind the Scenes)

Let’s break down the upload process using a YouTube-like example:

✅ Step 1: Client Requests a Presigned URL

When a user wants to upload a video, the client (e.g., browser or mobile app) sends a request to YouTube’s backend asking for a presigned URL.

✅ Step 2: Server Generates Presigned URL

The backend (YouTube server) generates a secure, short-lived URL using:

The file path (Key)
HTTP method (PUT for upload)
Expiry time (e.g., 5 minutes)
A cryptographic signature created using AWS credentials

✅ Step 3: URL Is Sent to Client

The server returns the presigned URL to the user’s device.

✅ Step 4: Client Uploads File Directly to Cloud

The client uploads the video directly to S3 using the URL, bypassing the application server entirely.

✅ Step 5: S3 Validates & Stores the File

S3 checks the URL’s validity:

Is the signature correct?
Has the URL expired?

If valid, the upload is accepted and stored. The backend can then be notified to process or catalog the file.

⚙️ What’s Inside a Presigned URL?

A presigned URL contains:

The target resource (bucket + file path)
The action allowed (PUT, GET, DELETE)
Expiry timestamp
A secure signature (HMAC with access key)

This ensures that only authorized, time-bound operations are allowed.

🚀 Why Use Presigned URLs Instead of Traditional Uploads?

Traditional Upload	Presigned URL
File flows through backend	File uploads directly to cloud
Backend must handle large files	Backend just creates the URL
Slower and expensive	Fast and scalable
Higher server load	Offloaded to cloud (e.g., S3)
Exposes infrastructure to risks	Link auto-expires, more secure

✅ Presigned URLs are:

🚀 Faster
💰 Cheaper
🔐 More secure
🌐 Easier to scale

🌐 AJAX – Asynchronous JavaScript and XML

✅ What is AJAX?

AJAX is a technique used in web development to send and receive data from a server asynchronously without reloading the entire web page.

🔁 AJAX allows partial page updates, making web apps fast and interactive.

🧠 Full Form:

Asynchronous

JavaScript

And

XML (Originally XML, now mostly JSON is used)

📱 Real-World Example:

Google Search Suggestions:

When you type in Google’s search bar, suggestions appear immediately without reloading the page. This is powered by AJAX.

⚙️ Technologies Involved:

Technology	Role
HTML/CSS	Structure & Styling
JavaScript	Logic and Events
XMLHttpRequest / `fetch()`	Send/receive data to/from server
JSON/XML	Data format used for communication
DOM	To update the web page dynamically

🔁 How AJAX Works (Step-by-Step):

User interacts with the web page (e.g., clicks a button).
JavaScript sends a request to the server (in background).
Server processes the request and sends data back.
JavaScript receives the data and updates the web page (without reload).

📦 Example Code (Using fetch API):

// Send AJAX request to server
fetch('/api/user')
  .then(response => response.json())
  .then(data => {
    // Update page dynamically
    document.getElementById('username').innerText = data.name;
  });

Database Partitioning vs Sharding

🔍 Introduction

As data grows exponentially in modern systems, managing and querying large datasets efficiently becomes critical. Two common approaches to handle large-scale databases are:

Partitioning: Dividing data within a single database.
Sharding: Distributing data across multiple databases or servers.

Both techniques improve performance, scalability, and maintainability, but they serve different purposes and operate at different levels of system architecture.

1️⃣ What is Partitioning?

✅ Definition:

Partitioning is the process of dividing a single large table or index into smaller, manageable pieces called partitions.

These partitions are still part of the same logical table and are managed by the same database engine.

🔧 Types of Partitioning:

Type	Description	Use Case
Range	Data split by value range in a column	Time-based data (logs, sales)
List	Data split by discrete column values	Country/region/user-type
Hash	Data distributed via a hash function	Even load distribution
Composite	Combines two types (e.g., Range + Hash)	Multi-dimensional datasets

🧱 Horizontal vs Vertical Partitioning:

Type	Description	Use Case
Horizontal	Split rows across partitions	Logs, user records, transactions
Vertical	Split columns across tables	Sensitive vs non-sensitive data

✅ Benefits

Faster queries (due to partition pruning)
Easier maintenance (backup/drop/archive)
Scalability within a single database

⚠️ Drawbacks

Added schema complexity
Not all DBs support all partition types
Uneven data can cause data skew

2️⃣ What is Sharding?

✅ Definition:

Sharding is the process of splitting a dataset across multiple physical databases or servers, each called a shard.

Each shard holds a subset of the entire data and can be queried independently.

🔧 Types of Sharding:

Type	Description	Use Case
Horizontal	Different rows in each shard	Large user base split by user_id
Vertical	Different tables or services per shard	Microservices with separate schemas
Geo-Sharding	Based on geography or region	Global apps (e.g., Asia, EU users)

🧱 Example:

Shard	Data Range
Shard 1	user_id 1–10 million
Shard 2	user_id 10M–20 million
Shard 3	user_id 20M–30 million

🛠 Tools That Support Sharding:

MongoDB (built-in)
Vitess (MySQL)
Citus (PostgreSQL)
Cassandra (sharded by design)
ElasticSearch (auto-sharding)

✅ Benefits

True horizontal scaling
Improved availability & fault isolation
Handles very large datasets across regions

⚠️ Drawbacks

Complex to implement and maintain
Cross-shard joins are difficult
Requires careful shard key design
Complex backup & consistency management

🔁 Partitioning vs Sharding: Comparison Table

Feature	Partitioning	Sharding
Scope	Inside one database	Across multiple databases/servers
Managed By	Database Engine	Application or Shard Middleware
Logical Unit	Table partition	Database/shard
Cross-Partition Joins	Supported	Difficult or unsupported
Scalability	Limited to DB machine	Horizontally scalable
Use Case	Structured, large tables	Global-scale systems (Facebook, etc.)

📌 Summary

Partitioning is suitable for scaling within a single database and improving query performance for large tables.
Sharding is ideal for massive-scale, distributed systems that require true horizontal scaling and fault tolerance.

Use the right strategy based on your system's architecture, data volume, and scalability requirements.

🧭 Difference Between Observability and Monitoring

Aspect	Monitoring	Observability
🔍 Definition	Collecting predefined metrics to track system health	Understanding internal state of a system by analyzing outputs
🎯 Goal	Detect known issues and alert when something breaks	Investigate and diagnose unknown or complex issues
🔧 Approach	Reactive – predefined checks and dashboards	Proactive – enables asking new questions and exploring behavior
🔬 Focus	Known problems	Unknown unknowns
🧱 Components	Metrics, alerts, dashboards	Metrics + Logs + Traces (3 Pillars of Observability)
📊 Tools	Prometheus, Nagios, Zabbix	OpenTelemetry, Grafana, Jaeger, Honeycomb
🚨 Use case	Alert when CPU > 90%	Understand why latency is increasing randomly
💡 Analogy	Thermometer shows temperature (monitoring)	Doctor uses symptoms + scan + history to diagnose (observability)

📦 Example

Monitoring:

You set a rule: “Alert me if memory usage goes above 90%”.
You get notified when it does.

Observability:

Your app slows down.
You don't know why.
You dive into metrics, traces, logs – see a DB call is slow due to network latency.
You find a misconfigured load balancer in a specific region.

✅ Key Takeaway:

Monitoring is a subset of Observability.

Observability is about having enough data and tooling to answer any question about your system, even if you didn’t anticipate the issue in advance.

📡 What is OpenTelemetry?

OpenTelemetry is a vendor-neutral, open-source observability framework by the CNCF that provides standardized tools to collect, process, and export telemetry data — specifically metrics, logs, and traces — from applications and infrastructure.

It consists of:

SDKs for instrumentation, and
A collector component that receives telemetry data, processes it (like batching or sampling), and exports it to observability backends like New Relic, Prometheus, Jaeger, or any OTLP-compatible platform.

💪 Why OpenTelemetry is Powerful

What makes OpenTelemetry powerful is that it decouples telemetry generation from storage or visualization.

You write once using OTel SDKs and can export to any backend without being locked into a vendor.

🧪 Real-World Example

In my previous project at Janitri, I used OpenTelemetry SDKs in the backend to instrument REST APIs and used the OpenTelemetry Collector to forward metrics to Prometheus.

Logs and traces were optionally integrated via extensions.

🔄 In a New Relic Setup

This same SDK can send data directly to New Relic via the OTLP exporter, giving you full-stack visibility — with no vendor-specific lock-in.

🎯 Conclusion

That’s the beauty of OpenTelemetry:

It’s interoperable
It’s future-proof
It aligns deeply with New Relic’s support for open standards

📊 What is Prometheus?

Prometheus is an open-source, time-series database and monitoring system originally developed by SoundCloud and now part of the CNCF (Cloud Native Computing Foundation).

It is designed to collect and store metrics from systems and applications using a pull-based model.

⚙️ How Prometheus Works

Prometheus scrapes data from exposed endpoints (typically /metrics).
It stores this data in its local time-series database (TSDB).
Querying is done using its powerful query language called PromQL.
It supports rule-based alerting using its built-in component called Alertmanager.

📌 Key Characteristics

Feature	Description
🔄 Pull-Based Model	Prometheus pulls metrics data from targets, instead of targets pushing data
📈 Metric-Focused	Only handles metrics (no support for logs or traces)
🧠 PromQL	A flexible and powerful query language
🚫 No Built-in Clustering	Does not support native clustering or long-term storage out of the box
🔗 Extensibility	Can be extended using projects like Thanos or Cortex for high availability

👨‍💻 Real-World Example (Janitri Project)

In my project at Janitri, I used Prometheus alongside OpenTelemetry to collect real-time metrics related to API performance.

I visualized this data using Grafana, which gave immediate insights, although the setup required some effort and configuration.

🤝 Why Prometheus with OpenTelemetry?

OpenTelemetry is a telemetry generation and export framework — not a full observability stack.

It collects metrics, logs, and traces from applications using SDKs and exports them to a backend.

Prometheus is one such backend — specialized in metrics.

🔁 Integration Flow

I used OpenTelemetry SDKs to instrument my application.
Then I used the OpenTelemetry Collector to expose metrics in Prometheus format via the /metrics receiver.
Prometheus scraped this data, stored it, and allowed me to:
- Query it using PromQL
- Set up alerts via Alertmanager

🔗 Conclusion

Prometheus completed what OpenTelemetry started —

🛠️ OTel was the producer

🧠 Prometheus was the consumer, storage, and query engine

This architecture was:

✅ Modular
🔄 Flexible
🔮 Future-proof

If needed, I could easily swap Prometheus with any OTLP-compatible backend (e.g., New Relic) without changing instrumentation code.

That’s the power of combining OpenTelemetry with open, pluggable tools like Prometheus.

This architecture was modular and future-proof. If needed, I could swap Prometheus with any other OTLP-compatible backend — like New
Relic — without changing instrumentation code, In New Relic’s case, I can just add an OTLP exporter to forward all telemetry to New Relic’s platform

🧭 Full Observability Stack using OpenTelemetry

This architecture illustrates how telemetry flows from instrumented code all the way to dashboards using tools like OpenTelemetry, Prometheus, Loki, Jaeger, and Grafana.

1️⃣ Instrumentation Layer (Your Code)

Add OpenTelemetry SDKs to generate telemetry (metrics, logs, traces).

You can use:

Auto-instrumentation agents

(e.g. for Node.js, Python, Java)
Manual instrumentation

(tracer.startSpan(), meter.record(), etc.)

2️⃣ Collector Layer

The OpenTelemetry Collector is the heart of the pipeline:

Receives data via receivers
Processes data (optional) via processors
Sends data to exporters (e.g., Prometheus, Jaeger)

You can run the Collector as:

🟢 Agent – runs locally on each host (lightweight)
🟣 Gateway – centralized telemetry router (common in prod)

3️⃣ Backend Layer

These are the specialized storage tools for each data type:

Data Type	Tool	Purpose
Metrics	Prometheus	Monitoring, alerting, dashboards
Logs	Loki	Log aggregation & searchable logs
Traces	Jaeger/Tempo	Distributed tracing & request flow

These tools store and index the telemetry so that Grafana (or New Relic) can query them.

4️⃣ Visualization Layer (Grafana)

Grafana connects to:
- Prometheus (for metrics)
- Loki (for logs)
- Jaeger/Tempo (for traces)
Unified dashboards for all observability pillars
Create alerts (e.g., CPU > 80%, error rate > 5%)
Supports full correlation:
- Logs → Traces → Metrics from one screen

🧠 Key Interview Lines You Can Drop

“The OpenTelemetry Collector acts as a hub where all telemetry — metrics, logs, traces — is routed, transformed, and exported.”
“Grafana sits on top as the visual UI, but the data lifeblood flows from instrumented apps through OpenTelemetry.”
“In a real production setup, this model gives me flexibility: swap out Prometheus with New Relic just by changing the exporter.”

Git Merge vs Rebase vs Squash - Complete Guide

The Problem

Tumhare paas ek feature branch hai jismein commits A, B, C hain. Main branch mein meanwhile commits D, E add ho gaye hain. Ab kya karna hai?

main:     1---2---D---E
               \
feature:        A---B---C

Option 1: Git Merge 🔗

What happens:

git checkout main
git merge feature-branch

Result:

main: 1---2---D---E---M
           \         /
feature:    A---B---C

Simple Explanation:

Ek merge commit (M) create hota hai
Dono branches ka history preserve rehta hai
"Knot" jaisa structure banta hai

When to use:

Jab complete history chahiye
Team collaboration mein transparency chahiye
Feature branch ka detailed development track karna ho

Option 2: Git Rebase ↗️

What happens:

git checkout feature-branch
git rebase main

Result:

main: 1---2---D---E---A'---B'---C'

Simple Explanation:

Feature branch commits ko main ke "tip" pe move kar deta hai
Clean, linear history milti hai
Original commits A,B,C become A',B',C' (new commit IDs)

When to use:

Clean, linear history chahiye
Complex merge conflicts avoid karne ke liye
Professional projects mein preferred

Option 3: Squash Commits 🗜️

What happens:

git checkout main
git merge --squash feature-branch
git commit -m "Add complete feature X"

Result:

main: 1---2---D---E---S

Simple Explanation:

Saare feature commits (A+B+C) ko ek single commit (S) mein combine kar deta hai
Main branch mein sirf ek clean commit dikhta hai
Individual commits ka detail lose ho jaata hai main mein

When to use:

Main branch mein clean history chahiye
Feature development details main mein nahi chahiye
GitHub/GitLab mein popular approach

Real World Scenarios 🌍

Scenario 1: Small Personal Project

Use: Simple merge

History complexity matter nahi karta
Quick and easy

Scenario 2: Professional Team Project

Use: Rebase + Fast-forward merge

Clean linear history
Easy to track changes
Professional appearance

Scenario 3: Open Source Project

Use: Squash commits

Main branch clean rehti hai
Contributors ka detailed work feature branch mein preserved
Easy to review and rollback

Commands Summary 📝

Merge:

git checkout main
git merge feature-branch

Rebase:

git checkout feature-branch
git rebase main
git checkout main
git merge feature-branch  # Fast-forward merge

Squash:

git checkout main
git merge --squash feature-branch
git commit -m "Descriptive message"

1. What is Docker, and Why is it Used?

🔹 Real-Life Example:

✅ Why Use Docker?

2. Main Components of Docker

🛠️ 1. Docker Daemon (dockerd)

💻 2. Docker CLI (Command-Line Interface)

📦 3. Docker Images

📌 4. Docker Containers

🌐 5. Docker Hub

🗂️ 6. Docker Volumes

3. How is Docker Different from Virtual Machines?

⚡ Example:

🆚 Docker vs. Virtual Machines

docker run vs. docker start vs. docker exec

4. Popular and Useful Docker Commands

🔍 Container Management

🏗 Image Management

📦 Build and Run Containers

📂 Volume Management

Docker Compose: docker-compose.yml

What is docker-compose.yml?

Why is docker-compose.yml Useful?

Example of docker-compose.yml for a Web Application

Database Migrations

Database Migration with Sequelize

Purpose

Setting Up Migrations

Handling Schema Changes

Model Relationships

Collaborative Workflow

Github Action

Reference

Steps to Deploy on AWS EC2

1. Launch EC2 Instance

2. Add Secret Variables in GitHub

3. Connect to EC2 Instance

Install Docker on AWS EC2

4. Create Two Runners on the Same EC2 Instance

5. Create a Dockerfile for Node.js (Backend)

6. Create a GitHub Actions Workflow

7. Push Docker Images to DockerHub

8. Add Inbound/Outbound Rules on EC2 Instance

9. Access the Node.js Application

Deploying React App

What is GitHub Actions, and how does it work?

How do you trigger a GitHub Actions workflow?

What are the key components of a GitHub Actions workflow?

What is the difference between jobs, steps, and actions?

How do you use environment variables and secrets in GitHub Actions?

What are self-hosted runners, and when should you use them?

How do you cache dependencies in GitHub Actions?

How do you create a reusable workflow in GitHub Actions?

How do you set up a CI/CD pipeline using GitHub Actions?

What is the difference between workflow_dispatch, workflow_run, and schedule triggers?

How do you debug a failing GitHub Actions workflow?

How do you run a GitHub Actions workflow locally?

How do you optimize and speed up GitHub Actions workflows?

How do you manage permissions and security in GitHub Actions?

Websockets & Multi-backend system

Why Do Backends Need to Talk to Each Other?

Multi-Backend Communication - Final Interview Script

Question: "How do you handle communication between multiple backend services?"

Your Answer:

1. HTTP/REST - Synchronous Communication

2. Message Queues - Asynchronous 1:1

3. Pub-Sub - Event Broadcasting 1:N

4. WebSockets - Real-time Communication

Key Difference - Queue vs Pub-Sub:

Real Example - E-commerce System:

Technology Stack:

Decision Framework:

Production Considerations:

If Asked Follow-up Questions:

"What about data consistency?"

"How do you handle failures?"

"Which technology would you choose?"

Example: Payment Processing System

Types of Communication

Why WebSockets?

Limitations of HTTP:

Docker Compose: `docker-compose.yml`

What is `docker-compose.yml`?

Why is `docker-compose.yml` Useful?

Example of `docker-compose.yml` for a Web Application