DEV Community

Mritunjay Singh
Mritunjay Singh

Posted on • Edited on

Backend Interview Sheet

1. What is Docker, and Why is it Used?

Docker is an open-source containerization platform that allows developers to package applications and their dependencies into isolated environments called containers. These containers ensure that applications run consistently across different environments.

๐Ÿ”น Real-Life Example:

Imagine you're developing a MERN stack web app. It works fine on your laptop, but when your teammate runs it, they get "version mismatch" errors.

With Docker, you create a consistent environment across all machines, preventing such issues.

โœ… Why Use Docker?

Docker is beneficial when you need:

  • Portability โ†’ Works on any OS without compatibility issues
  • Consistency โ†’ Eliminates "It works on my machine" problems
  • Lightweight โ†’ Uses fewer system resources than virtual machines
  • Scalability โ†’ Quickly scale applications with minimal overhead

2. Main Components of Docker

๐Ÿ› ๏ธ 1. Docker Daemon (dockerd)

  • The background process that manages Docker containers
  • Listens for API requests and handles images, networks, and volumes

๐Ÿ’ป 2. Docker CLI (Command-Line Interface)

  • A tool to interact with the Docker Daemon
  • Common commands:
  docker ps        # List running containers  
  docker run       # Start a new container  
  docker stop      # Stop a running container  
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“ฆ 3. Docker Images

  • A read-only template containing the application, libraries, and dependencies
  • Immutable โ†’ Once built, images donโ€™t change
  • Used to create containers

๐Ÿ“Œ 4. Docker Containers

  • A running instance of a Docker image
  • Isolated from the host system but can interact if needed (e.g., exposing ports)

๐ŸŒ 5. Docker Hub

  • A cloud-based registry where Docker images are stored and shared

๐Ÿ—‚๏ธ 6. Docker Volumes

  • Used for persistent data storage outside of containers

๐Ÿ“Œ Illustration of Docker Components:

Diagram showing Docker architecture with Daemon, CLI, Images, Containers, and Volumes


3. How is Docker Different from Virtual Machines?

โšก Example:

You're testing a React.js + Express.js app. Instead of running a full Ubuntu VM (which consumes high RAM & CPU), you start a lightweight container in seconds:

docker run -d -p 3000:3000 node:16
Enter fullscreen mode Exit fullscreen mode

Unlike a VM, which takes minutes to boot, a container starts instantly.

๐Ÿ†š Docker vs. Virtual Machines

Feature Docker (Containers) Virtual Machines (VMs)
Boot Time Seconds Minutes
Size MBs GBs
Performance Near-native speed Slower due to hypervisor overhead
Isolation Process-level isolation Full OS-level isolation
Resource Efficiency Shares OS kernel, lightweight Requires full OS, resource-intensive

docker run vs. docker start vs. docker exec

docker run : Start a new container
docker start : Restart a stopped container
docker exec : Run a command inside it


4. Popular and Useful Docker Commands

Here are some of the most commonly used Docker commands:

๐Ÿ” Container Management

# List all running containers
docker ps  

# List all containers (including stopped ones)
docker ps -a  

# Start a stopped container
docker start <container_id>  

# Stop a running container
docker stop <container_id>  

# Remove a container
docker rm <container_id>  
Enter fullscreen mode Exit fullscreen mode

๐Ÿ— Image Management

# List all available images
docker images  

# Pull an image from Docker Hub
docker pull <image_name>  

# Remove an image
docker rmi <image_name>  
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“ฆ Build and Run Containers

# Build a Docker image from a Dockerfile
docker build -t <image_name> .  

# Run a container from an image
docker run -d -p 8080:80 <image_name>  
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“‚ Volume Management

# List all Docker volumes
docker volume ls  

# Create a new volume
docker volume create <volume_name>  

# Remove a volume
docker volume rm <volume_name>  
Enter fullscreen mode Exit fullscreen mode

Docker Compose: docker-compose.yml

What is docker-compose.yml?

The docker-compose.yml file is used to define and run multi-container Docker applications. With Docker Compose, you can manage and orchestrate multiple services, including databases, backend APIs, and front-end applications, all in a single file.

It allows you to define services, networks, and volumes, making it easier to deploy and manage applications that require multiple services working together.


Why is docker-compose.yml Useful?

  1. Simplifies Multi-Container Management:
    Instead of managing each container manually, Docker Compose allows you to define all services (frontend, backend, database, etc.) in one configuration file and launch them with a single command.

  2. Networking and Dependency Management:
    Docker Compose automatically creates a network for your containers, allowing them to communicate with each other. Services can be referenced by their service name, which means the backend can talk to the database without needing an IP address.

  3. One Command to Start Everything:
    Instead of running individual containers with complex docker run commands, Docker Compose lets you define the services and their dependencies in a YAML file, and run everything with docker-compose up.

  4. Simplified Development Environment:
    With Docker Compose, developers can easily replicate the production environment locally, using the same configuration for services like databases, backends, and frontends. It allows seamless integration and testing, as you don't have to manually set up each service.

  5. Environment Variable Management:
    You can manage environment variables for each service within the docker-compose.yml file, making it easier to configure your application for different environments (development, testing, production).


Example of docker-compose.yml for a Web Application

Letโ€™s walk through an example where we have three services:

  • Frontend: A React app running on port 3000.
  • Backend: A Node.js API running on port 5000.
  • Database: A MongoDB instance to store data.
version: '3.8'

services:
  frontend:
    build: ./frontend
    ports:
      - "3000:3000"
    volumes:
      - ./frontend:/app
    depends_on:
      - backend

  backend:
    build: ./backend
    ports:
      - "5000:5000"
    environment:
      - NODE_ENV=development
    depends_on:
      - database

  database:
    image: mongo
    volumes:
      - mongo-data:/data/db
    ports:
      - "27017:27017"

volumes:
  mongo-data:
Enter fullscreen mode Exit fullscreen mode

Database Migrations

  1. Explain how you would design and manage a database schema using Sequelize, including the process of setting up migrations, handling model relationships, optimizing for performance, and managing database changes in a collaborative team environment.

Database Migration with Sequelize

Purpose

Database migrations allow you to safely update and manage your database schema over time. They help track changes to the schema in a version-controlled manner, making it easy to collaborate in teams.

Setting Up Migrations

  1. Initialize Sequelize with sequelize-cli to generate migration files.
  2. Migration files contain two primary methods:
    • up: For applying changes (e.g., create tables, add columns).
    • down: For rolling back changes (undoing the applied changes).

Handling Schema Changes

  • Creating Migrations:
    When you need to add, modify, or delete database schema (e.g., tables, columns), you create a new migration file.

  • Applying Migrations:
    Use the command npx sequelize-cli db:migrate to apply migrations to the database.

  • Rolling Back Migrations:
    Use npx sequelize-cli db:migrate:undo to undo the last applied migration.

Model Relationships

  • Define associations (e.g., one-to-many, many-to-many) within your models using Sequelize methods:
    • hasMany, belongsTo, manyToMany, etc.

Collaborative Workflow

  1. Migrations should be version-controlled using Git.
  2. Each team member works with migrations, and when schema changes are required, new migrations are created and applied across all environments (development, staging, production).


Github Action

Reference

YouTube Video

GitHub Actions Workflow Diagram

Steps to Deploy on AWS EC2

1. Launch EC2 Instance

2. Add Secret Variables in GitHub

  • Go to GitHub Repo Settings โ†’ Secrets and Variables โ†’ Actions โ†’ Add Secret

3. Connect to EC2 Instance

Install Docker on AWS EC2
sudo apt-get update
sudo apt-get install docker.io -y
sudo systemctl start docker
sudo chmod 666 /var/run/docker.sock
sudo systemctl enable docker
docker --version
docker ps
Enter fullscreen mode Exit fullscreen mode

4. Create Two Runners on the Same EC2 Instance

  • In React App โ†’ Actions โ†’ Runner โ†’ New Self-Hosted Runner
  • Copy the download commands and run them in the EC2 instance terminal
  • Install it as a service to keep it running in the background
sudo ./svc.sh install
sudo ./svc.sh start
Enter fullscreen mode Exit fullscreen mode
  • Do the same for the Node.js Runner

5. Create a Dockerfile for Node.js (Backend)

6. Create a GitHub Actions Workflow

Create a .github/workflows/cicd.yml file

GitHub Actions Workflow Code Example

Docker Deployment Workflow Diagram

7. Push Docker Images to DockerHub

8. Add Inbound/Outbound Rules on EC2 Instance

9. Access the Node.js Application

  • Use EC2_PUBLIC_IP:PORT to access your application

Deploying React App

  • Create a Dockerfile for React
  • Follow the same process as above

What is GitHub Actions, and how does it work?

GitHub Actions is a CI/CD automation tool that allows you to define workflows in YAML to build, test, and deploy applications directly from GitHub repositories.

How do you trigger a GitHub Actions workflow?

Workflows can be triggered by events such as push, pull_request, schedule, workflow_dispatch, and repository_dispatch.

What are the key components of a GitHub Actions workflow?

Key components include:

  • Workflows (YAML files in .github/workflows/)
  • Jobs (Independent execution units in a workflow)
  • Steps (Commands executed in a job)
  • Actions (Reusable units of functionality)
  • Runners (Machines that execute jobs)

What is the difference between jobs, steps, and actions?

  • Jobs: Run in parallel or sequentially within a workflow.
  • Steps: Individual tasks executed within a job.
  • Actions: Pre-built reusable components within steps.

How do you use environment variables and secrets in GitHub Actions?

  • Define environment variables using env:
  env:
    NODE_ENV: production
Enter fullscreen mode Exit fullscreen mode
  • Store sensitive values in secrets:
  env:
    API_KEY: ${{ secrets.API_KEY }}
Enter fullscreen mode Exit fullscreen mode

What are self-hosted runners, and when should you use them?

Self-hosted runners are custom machines used to execute workflows instead of GitHub's hosted runners. Use them for private repositories, custom hardware, or specific dependencies.

How do you cache dependencies in GitHub Actions?

Use actions/cache@v3 to cache dependencies and speed up builds:

- uses: actions/cache@v3
  with:
    path: ~/.npm
    key: npm-${{ runner.os }}-${{ hashFiles('**/package-lock.json') }}
    restore-keys: npm-${{ runner.os }}
Enter fullscreen mode Exit fullscreen mode

How do you create a reusable workflow in GitHub Actions?

Define a workflow with on: workflow_call and call it from another workflow:

on: workflow_call
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - run: echo "Reusable workflow"
Enter fullscreen mode Exit fullscreen mode

How do you set up a CI/CD pipeline using GitHub Actions?

Define a workflow that includes jobs for building, testing, and deploying:

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - run: echo "Building..."
  test:
    runs-on: ubuntu-latest
    steps:
      - run: echo "Testing..."
  deploy:
    runs-on: ubuntu-latest
    needs: test
    steps:
      - run: echo "Deploying..."
Enter fullscreen mode Exit fullscreen mode

What is the difference between workflow_dispatch, workflow_run, and schedule triggers?

  • workflow_dispatch: Manual trigger via GitHub UI/API.
  • workflow_run: Triggered when another workflow finishes.
  • schedule: Runs workflows at specific times using cron syntax.

How do you debug a failing GitHub Actions workflow?

  • Check logs in GitHub Actions UI.
  • Use set -x in bash scripts for verbose output.
  • Add continue-on-error: true to isolate issues.

How do you run a GitHub Actions workflow locally?

Use act, a tool that simulates GitHub Actions on your local machine:

act
Enter fullscreen mode Exit fullscreen mode

How do you optimize and speed up GitHub Actions workflows?

  • Use caching (actions/cache@v3).
  • Run jobs in parallel when possible.
  • Use matrix builds for different environments.
  • Limit workflow execution to necessary branches.

How do you manage permissions and security in GitHub Actions?

  • Use least privilege principle for tokens (GITHUB_TOKEN).
  • Restrict secrets exposure to trusted workflows.
  • Use branch protection rules to limit workflow execution.


Websockets & Multi-backend system

Why Do Backends Need to Talk to Each Other?

why

In a typical client-server architecture, communication happens between the browser (client) and the backend server. However, as applications grow, keeping everything on a single server exposed to the internet becomes inefficient and unscalable.

When designing a multi-backend system, you need to consider:

  • If there are multiple services, how should they communicate when an event occurs?
  • Should it be an immediate HTTP call?
  • Should the event be sent to a queue?
  • Should the services communicate via WebSockets?
  • Should you use a Pub-Sub mechanism?

These decisions impact performance, scalability, and reliability.

Multi-Backend Communication - Final Interview Script

Question: "How do you handle communication between multiple backend services?"


Your Answer:

"When designing multi-backend systems, we have four main communication patterns, each serving different use cases.

1. HTTP/REST - Synchronous Communication

This is direct API calls between services. For example, when a user places an order, the User Service calls Order Service, which then calls Payment Service immediately.

Use case: When you need immediate response and strong consistency, like user authentication or payment validation.

Pros: Simple to implement, immediate feedback, strong consistency
Cons: Tight coupling, if one service fails, whole chain breaks

2. Message Queues - Asynchronous 1:1

Here we use message brokers like RabbitMQ or Amazon SQS. Messages are placed in queues and consumers pick them up when ready. It's point-to-point communication - only one consumer gets each message.

Use case: Task distribution, background job processing, load balancing
Example: Multiple payment workers processing payment requests from a queue

Pros: Loose coupling, fault tolerance, load balancing
Cons: Eventual consistency, more complex error handling

3. Pub-Sub - Event Broadcasting 1:N

Publishers send events to topics, and multiple subscribers listen to the same topic. Same message goes to all subscribers.

Use case: Event-driven architecture where multiple services need to react to same event
Example: When order is created, Inventory Service updates stock, Email Service sends confirmation, Analytics tracks metrics - all from same event

Pros: Highly decoupled, easy to add new features, scalable
Cons: Message ordering challenges, duplicate handling needed

4. WebSockets - Real-time Communication

Persistent bidirectional connections for real-time communication.

Use case: Chat applications, live updates, gaming
Pros: Real-time, low latency
Cons: Resource intensive, connection management complexity


Key Difference - Queue vs Pub-Sub:

Both have same components - Publisher/Producer, Broker, and Consumer/Subscriber. The difference is in message delivery:

  • Queue: 1:1 - Messages compete, only one consumer gets each message
  • Pub-Sub: 1:N - Same message broadcasted to all subscribers

Real Example - E-commerce System:

I would use a hybrid approach:

  1. User places order - HTTP call for immediate validation
  2. Order processing - Pub-Sub event 'ORDER_CREATED' to notify multiple services
  3. Background tasks - Queue for heavy processing like report generation

Technology Stack:

  • Apache Kafka - Can work as both queue and pub-sub
  • RabbitMQ - For reliable message queuing
  • Redis Pub-Sub - For simple event broadcasting
  • Amazon SQS/SNS - For managed cloud solutions

Decision Framework:

Choose HTTP when: Need immediate response, strong consistency, simple flows
Choose Queues when: Task distribution, load balancing, background processing
Choose Pub-Sub when: Multiple services need same event, event-driven architecture
Choose WebSockets when: Real-time bidirectional communication needed


Production Considerations:

  • Error Handling: Circuit breakers, dead letter queues, retry mechanisms
  • Monitoring: Queue depths, processing times, error rates
  • Scalability: Horizontal scaling of consumers, proper partitioning

The key is choosing the right pattern for each specific use case rather than using one approach everywhere."


If Asked Follow-up Questions:

"What about data consistency?"

"For strong consistency, use HTTP calls. For eventual consistency, use async patterns with proper error handling and compensation transactions."

"How do you handle failures?"

"Circuit breakers for HTTP, dead letter queues for messages, retry mechanisms with exponential backoff, and proper monitoring."

"Which technology would you choose?"

"Kafka for high throughput and both queue/pub-sub needs, RabbitMQ for complex routing, SQS for simple cloud solutions."


Example: Payment Processing System

Payment

Let's consider a payment application. When a transaction occurs:

  1. The database update should happen immediately (synchronous).
  2. The notification (email/SMS) can be pushed to a queue (asynchronous).

Why not handle everything in the primary backend?

  • If the email service is down, should the user be forced to wait after completing the transaction? No!
  • Instead, we push the notification event to a queue.
  • Even if the notification service is down, the queue retains the event and sends notifications once the service is back.
  • This is why message queues (e.g., RabbitMQ, Kafka, AWS SQS) are better than HTTP for such tasks.

Types of Communication

  1. Synchronous Communication

    • The system waits for a response from the other system.
    • Examples: HTTP requests, WebSockets (in some cases).
  2. Asynchronous Communication

    • The system does not wait for a response.
    • Examples: Message queues, Pub-Sub services.

Why WebSockets?

WebSockets provide persistent, full-duplex communication over a single TCP handshake.

Limitations of HTTP:

  • In HTTP, the server cannot push events to the client on its own.
  • The client (browser) can request, and the server can respond, but the server cannot initiate communication with the client.

WebSockets vs. HTTP for Real-Time Applications

Example: Stock Market Trading System

  • Stock buying & selling generates millions of requests per second.
  • If you use HTTP, every request requires a three-way handshake, adding latency and overhead.
  • With WebSockets, the handshake happens only once, and then the server and client can continuously exchange data.

Alternative: Polling

If you still want to use HTTP for real-time updates, an alternative approach is polling.

  • However, polling creates unnecessary load on the server by making frequent requests.
  • WebSockets are a more efficient solution for real-time updates.


Some Basic Questions

Basic

What is Node.js?

Node.js is a runtime environment for executing JavaScript on the server side. It is not a framework or a language. A runtime is responsible for memory management and converting high-level code into machine code.

Examples:

  • Java: JVM (Runtime) โ†’ Spring (Framework)
  • Python: CPython (Runtime) โ†’ Django (Framework)
  • JavaScript: Node.js (Runtime) โ†’ Express.js (Framework)

With Node.js, JavaScript can run outside the browser as well.

Runtime vs Frameworks

  • Runtime: Focuses on executing code, handling memory, and managing I/O.
  • Framework: Provides structured tools and libraries to simplify development.

What happens when you enter a URL in the browser and hit enter?

DNS Lookup

The browser checks if it already knows the IP address for www.example.com.
If not, it contacts a DNS (Domain Name System) server to get the IP address (e.g., 192.168.1.1).
Enter fullscreen mode Exit fullscreen mode

Establishing Connection

The browser initiates a TCP connection with the web server using a process called three-way handshake.
If the website uses HTTPS, a TLS handshake happens to encrypt the communication.
Enter fullscreen mode Exit fullscreen mode

Sending HTTP Request

The browser sends an HTTP request to the server:

GET / HTTP/1.1
Host: www.example.com
Enter fullscreen mode Exit fullscreen mode

Server Processing

The web server processes the request and may:
    Fetch data from a database
    Generate a response (HTML, JSON, etc.)
Enter fullscreen mode Exit fullscreen mode

Receiving the Response

The server sends an HTTP response back to the browser:

HTTP/1.1 200 OK
Content-Type: text/html
Enter fullscreen mode Exit fullscreen mode

Rendering the Page

The browser processes the HTML, CSS, and JavaScript and displays the webpage.
Enter fullscreen mode Exit fullscreen mode

Difference Between Monolithic and Microservices Architecture

Monolithic Architecture

  • All components (UI, DB, Auth, etc.) are tightly coupled.
  • Single application handles everything.

Microservices Architecture

  • Divided into small, independent services.
  • Each service handles a specific function (Auth, Payments, etc.).

Pros:

  • Scalable
  • Services can use different tech stacks

Cons:

  • More complex to manage
  • Requires API communication

HTTP Status Codes

  • 200 OK
  • 201 Created
  • 400 Bad Request
  • 401 Unauthorized
  • 402 Payment Required
  • 404 Not Found
  • 405 Method Not Allowed
  • 500 Internal Server Error

What is cors ?

CORS stand for Cross Origin Resource Sharing- a security feature built into browsers
It blocks the requests from one origin(domain,protocol or port) to another origin unless explicitly allowed by the server
For exmple: Your frontend is hosted at frontend.com and you bacend at backend.com
The browser these as a different origin and blocks the request unless it is explicitly allowed
why does this happen though?
CORS error are triggered by Same Origin Policy,which prevents malicious websites from making unauthorized API call using your credentials

Browser isn't blocking the requests---its blocking the response for security reasons

REST vs GraphQL

REST API:

"REST (Representational State Transfer) is an architectural style where data is fetched using multiple endpoints, and each request returns a fixed structure of data."

GraphQL:

"GraphQL is a query language for APIs that allows clients to request only the data they need, reducing overfetching and underfetching."

๐Ÿ’ก Key Point:

  • REST APIs have multiple endpoints (/users, /orders), while GraphQL has a single endpoint (/graphql).
  • GraphQL provides more flexibility by allowing clients to request exactly what they need in a single query.
  • REST APIs return predefined responses and sometimes require multiple requests.
  • If performance and flexibility are key concerns, GraphQL is a better choice.

How Do You Design an API for a Large-Scale System?

  • Use Microservices: Separate services (Auth, Payments, etc.).
  • Load Balancers: Distribute traffic efficiently.
  • Caching: Use Redis for frequently accessed data.
  • Pagination: Send data in chunks.
  • Rate Limiting: Prevent API abuse.

What is Pagination? How to Implement It?

Pagination breaks large datasets into smaller parts.
Implementation:

  • Use limit and offset in database queries.
  • Example:
  SELECT * FROM users LIMIT 10 OFFSET 20;
Enter fullscreen mode Exit fullscreen mode
  • Use cursor-based pagination for better performance.

How Do You Handle File Uploads?

  • Single file upload: Use multipart/form-data with Express.js & Multer.
  • Large file handling: Use chunked uploads.
  • Storage options: Store files on AWS S3, Google Cloud Storage, or a database.
  • Server-side Upload: The file is uploaded to your backend server first, and then the server sends it to S3 or Cloudinary.

JWT - Final Interview Answer Script

Question: "What is JWT? How does it work?"


Your Complete Answer:

"JWT stands for JSON Web Token. It's a stateless authentication mechanism where user information is encoded in a token that can be verified without storing session data on the server.

JWT Structure - 3 Parts:

JWT has three parts separated by dots:
header.payload.signature

1. Header: Contains metadata about the token

{
  "alg": "HS256",    // Algorithm used
  "typ": "JWT"       // Token type
}
Enter fullscreen mode Exit fullscreen mode

2. Payload: Contains user information and claims

{
  "userId": 123,
  "role": "admin",
  "exp": 1640995200    // Expiry timestamp
}
Enter fullscreen mode Exit fullscreen mode

3. Signature: Ensures token integrity and authenticity

  • Created by encrypting header + payload with a secret key
  • Used to verify token hasn't been tampered with

How JWT Authentication Works:

Step 1 - User Login:

  • User sends credentials to server
  • Server validates credentials
  • If valid, server creates JWT token

Step 2 - Token Creation:

  • Server creates header and payload
  • Server generates signature using secret key: HMAC-SHA256(header.payload, secretKey)
  • All three parts are combined: header.payload.signature

Step 3 - Token Usage:

  • Server sends token to client
  • Client stores token (localStorage or cookie)
  • Client sends token in Authorization header for API requests

Step 4 - Token Verification:

  • Server receives token with request
  • Server splits token into three parts
  • Server recreates signature using same secret key
  • If signatures match, token is valid
  • Server extracts user info from payload

Key Benefits:

Stateless: No need to store session data on server
Scalable: Works across multiple servers
Self-contained: All user info is in the token
Cross-domain: Can work across different domains

Security Considerations:

Secret Key: Never expose the secret key used for signing
Expiry: Always set short expiry times (15-30 minutes)
HTTPS: Always use HTTPS to prevent token interception
Storage: Be careful about XSS if storing in localStorage

Real-world Example:

When user logs into an e-commerce site:

  1. User enters username/password
  2. Server validates and creates JWT with user ID, role, expiry
  3. Client stores JWT and sends it with every API call
  4. Server verifies JWT and processes request
  5. When token expires, user needs to login again or refresh token

JWT vs Sessions:

JWT:

  • Stateless (no server storage)
  • Better for APIs and microservices
  • Self-contained

Sessions:

  • Stateful (server stores session data)
  • Better for traditional web apps
  • More secure (data on server)

The choice depends on your architecture - use JWT for REST APIs and distributed systems, sessions for traditional web applications."


If Asked Follow-up Questions:

"How do you handle token expiry?"

"Use refresh tokens. Short-lived access tokens (15 mins) with longer-lived refresh tokens (7 days). When access token expires, use refresh token to get new access token."

"What if someone steals the JWT?"

"That's why we use short expiry times, HTTPS only, and httpOnly cookies when possible. Also implement token blacklisting for logout."

"Can JWT be modified?"

"If someone modifies the payload, the signature won't match because they don't have the secret key. Server will reject the token."

"Where do you store JWT on client?"

"For web apps: httpOnly cookies for security, or localStorage for convenience but with XSS risk. For mobile: secure storage."


Question: "Explain Cookies, Sessions, Tokens, and Local Storage for authentication."


Your Answer:

"These are four different ways to handle user authentication and data storage. Let me explain each:

1. COOKIES - Automatic Browser Storage

What it is:
Cookies are small pieces of data that the server sends to the browser, and the browser automatically sends them back with every request.

How it works:

  • Server creates cookie and sends to browser
  • Browser stores it automatically
  • Browser includes cookie in every HTTP request to that domain
  • Server reads cookie data from request

Authentication use:

User logs in โ†’ Server creates cookie: authId=abc123 โ†’ Browser stores it โ†’ 
Every request includes: Cookie: authId=abc123 โ†’ Server validates cookie
Enter fullscreen mode Exit fullscreen mode

Example: When you login to Facebook, server sets cookie with session ID. Now every page you visit automatically sends this cookie.


2. SESSIONS - Server-Side Storage

What it is:
Session is user data stored on the server, identified by a session ID that's typically stored in a cookie.

How it works:

  • User logs in โ†’ Server creates session data in memory/database
  • Server generates unique session ID
  • Session ID is sent to browser via cookie
  • Browser sends session ID back with requests
  • Server looks up session data using this ID

Authentication flow:

Login โ†’ Server creates: sessions[abc123] = {userId: 456, role: 'admin'} โ†’
Cookie: sessionId=abc123 โ†’ Server uses ID to fetch user data
Enter fullscreen mode Exit fullscreen mode

Example: Traditional web applications where user data is stored on server for security.


3. TOKENS (JWT) - Self-Contained Authentication

What it is:
A token is an encoded string containing user information that can be verified without storing anything on the server.

How JWT works:

  • Contains 3 parts: Header.Payload.Signature
  • Payload has user info (userId, role, expiry)
  • Signature ensures token hasn't been tampered with
  • Server can verify token without database lookup

Authentication flow:

Login โ†’ Server creates JWT token with user info โ†’ Client stores token โ†’
Client sends: Authorization: Bearer <token> โ†’ Server verifies signature
Enter fullscreen mode Exit fullscreen mode

Example: REST APIs where each request includes JWT token in Authorization header.


4. LOCAL STORAGE - Browser Client Storage

What it is:
Browser's built-in storage that persists data locally, accessible via JavaScript.

How it works:

  • JavaScript can store/retrieve data: localStorage.setItem('token', 'abc123')
  • Data persists even after browser closes
  • Available to JavaScript on same domain
  • 5-10MB storage capacity

Authentication use:

Login โ†’ Store token: localStorage.setItem('authToken', token) โ†’
API calls โ†’ Get token: localStorage.getItem('authToken') โ†’ 
Send manually: headers: {Authorization: Bearer + token}
Enter fullscreen mode Exit fullscreen mode

Example: Single Page Applications (SPAs) where JavaScript manages authentication.


Key Differences Summary:

Storage Location:

  • Cookies: Browser (managed automatically)
  • Sessions: Server-side (secure)
  • Tokens: Client-side (self-contained)
  • Local Storage: Browser (manual JavaScript)

Security:

  • Cookies: Can be HttpOnly (XSS safe), but CSRF risk
  • Sessions: Most secure (data on server)
  • Tokens: Stateless but vulnerable if stolen
  • Local Storage: Vulnerable to XSS attacks

Usage:

  • Cookies: Automatic with every request
  • Sessions: Server looks up data using session ID
  • Tokens: Manual inclusion in headers
  • Local Storage: Manual JavaScript handling

When to Use What:

Use Cookies + Sessions when:

  • Traditional web applications
  • Maximum security needed
  • Server-side rendering
  • Simple user flows

Use Tokens (JWT) when:

  • REST APIs
  • Mobile applications
  • Microservices architecture
  • Need stateless authentication

Use Local Storage when:

  • Single Page Applications (SPAs)
  • Need persistent client-side data
  • Want manual control over auth flow
  • Client-side JavaScript frameworks



Intermediate

What is full text search?

What is Serverless and Serverful backend ?

A serverfull backend means you manage the entire server, while a serverless backend means you donโ€™t have to manage serversโ€”your code runs only when needed on cloud platforms like AWS Lambda
Example: Imagine you are building a food delivery app like Zomato or Uber Eats.

If you use a serverfull backend:
    You set up an Express.js server on AWS EC2.
    The server is always running, handling all API requests like fetching restaurants, placing orders, and tracking deliveries.
    You pay for the server 24/7, even when there are no active users.

If you use a serverless backend:
    You use AWS Lambda functions to handle API requests.
    When a user places an order, the function runs only for that request and then shuts down.
    You only pay for execution time, making it cost-effective.
Enter fullscreen mode Exit fullscreen mode

Can you explain single-threaded vs. multi-threaded processing?

Single-threaded programs execute one task at a time, while multi-threaded programs can execute multiple tasks in parallel. However, single-threaded systems can still be asynchronous using event loops, like in Node.js. If I were building a CPU-intensive app like a video editor, Iโ€™d go with multi-threading. But for an API server handling multiple users, Iโ€™d use a single-threaded, asynchronous model like Node.js to handle requests efficiently

๐Ÿง  Web Server Request Handling โ€“ Full Interview Deep Dive

Understand how web servers handle various types of requests, what part of the system gets triggered, and why CPU, disk, and memory are used in different ways.


๐Ÿ”น Case 1: Static File Request (e.g., GET /index.html)

๐Ÿงฑ Architecture:

Client โ†’ Web Server (Nginx, Apache) โ†’ Disk

Step Description CPU Used? Why
1 TCP Connection Establishment โœ… OS uses CPU threads to handle new socket connection
2 TLS Handshake (if HTTPS) โœ…โœ… Public-key crypto (RSA/ECC), key exchange โ€“ very CPU intensive
3 HTTP Request Parsing โœ… Server reads headers, URL, method
4 Check In-Memory Cache โš ๏ธ Sometimes If file is cached, skip disk I/O (saves time and CPU)
5 Disk I/O โ€“ Read File โš ๏ธ + I/O Slowest part if uncached (mechanical disk = even slower)
6 Build HTTP Response โœ… Add headers, content-type, status, etc.
7 Send Response (TCP Send) โœ… Network stack and syscalls involve CPU

โœ… Conclusion:

  • Mostly I/O bound, but CPU handles parsing & networking
  • With HTTPS, CPU spikes due to encryption

๐Ÿ”น Case 2: Dynamic Request (Backend involved)

e.g., GET /profile?id=10

๐Ÿงฑ Architecture:

Client โ†’ Web Server โ†’ Backend Server โ†’ DB

Step Description CPU Used? Why
1 TCP + TLS Handshake โœ…โœ… Same as static case
2 Request Parsing โœ… Headers, query params
3 Reverse Proxy to Backend โœ… Web server forwards via IPC/port
4 Backend App Logic โœ…โœ… Routing, auth, business logic (CPU heavy)
5 Database Query โš ๏ธ CPU + I/O Reads/writes involve disk and DB engine CPU
6 Response Generation (HTML/JSON) โœ…โœ… Templating or serialization is CPU-bound
7 Send Response โ†’ Client โœ… Network transmission

โœ… Conclusion:

  • This is both CPU + I/O bound
  • More cores help in scaling
  • Backend does the heavy lifting, web server is just the router

๐Ÿ”น Case 3: Cached Response

๐Ÿงฑ Architecture:

Client โ†’ Web Server โ†’ Cache (Redis/Memcached/internal) โ†’ Client

Step Description CPU Used? Why
1 TCP + HTTP Parsing โœ… Normal
2 Cache Lookup (Memory) โš ๏ธ Fast RAM lookup, nearly no disk or backend call
3 Response Ready โ†’ Send โœ… Minimal CPU for sending back

โœ… Conclusion:

  • Fastest flow among all
  • Skips backend & disk I/O โ†’ highly efficient
  • Caching = performance booster

๐Ÿ”น Case 4: Reverse Proxy (Static + Dynamic Mix)

๐Ÿงฑ Architecture:

Client โ†’ Nginx (Reverse Proxy) โ†’ Static OR Backend

Step Description CPU Used? Why
1 Request to Nginx โœ… Parses incoming request
2 Nginx Checks Routes โœ… Matches URI patterns
3 Serve Static (if matched) โš ๏ธ Disk read if not cached
4 Else Proxy to Backend โœ… Same as Case 2
5 Send Response Back โœ… Nginx acts as gateway

โœ… Conclusion:

  • Nginx = Traffic Manager
  • Smart separation between static and dynamic content
  • Efficient request routing saves resources

๐Ÿ”น Case 5: HTTPS (TLS) Request

Step Description CPU Used? Why
1 TCP Connection โœ… Basic connection setup
2 TLS Handshake โœ…โœ…โœ… Expensive: Cert validation, RSA/AES/ECC operations
3 HTTP Parsing โœ… After TLS tunnel established

โœ… Conclusion:

  • TLS is CPU-heavy
  • TLS Offloading to Cloudflare or Load Balancer is often used

๐Ÿ”น Case 6: API Request (POST JSON)

๐Ÿงฑ Architecture:

Client โ†’ Web Server/API Gateway โ†’ Backend โ†’ DB

Step Description CPU Used? Why
1 Receive POST โœ… TCP + header parsing
2 JSON Body Parsing โœ…โœ… Deserialization consumes CPU
3 Business Logic โœ…โœ… Auth, validation, core logic
4 DB Query โš ๏ธ DB fetch/update
5 Build JSON Response โœ…โœ… JSON.stringify() or equivalent
6 Send Response โœ… Network syscall

โœ… Conclusion:

  • APIs (especially large JSON) are CPU-bound
  • Parsing/serializing JSON = CPU cycles
  • Use optimized libraries (like fast-json-stringify, etc.)

๐Ÿ”น Case 7: File Upload / Download

๐Ÿงฑ Architecture:

Client โ†’ Web Server โ†’ Disk / Object Store (e.g., S3)

Step Description CPU Used? Why
1 TCP + Parse โœ… Start request
2 Read File Chunks (Upload) โœ… + I/O Buffered I/O reads
3 Write to Disk/S3 โš ๏ธ Disk or network-based I/O
4 Send Acknowledgement โœ… Final response

โœ… Conclusion:

  • I/O-bound process, CPU handles chunking and buffering
  • Network & Disk performance matter a lot here


HTTP/2 and HTTP/3 Support in Web Servers


๐Ÿ”น What is HTTP?

  • HTTP (HyperText Transfer Protocol) is an application-layer protocol used for communication between clients (like browsers) and web servers.
  • Versions: HTTP/1.1 โ†’ HTTP/2 โ†’ HTTP/3

๐Ÿš€ Why HTTP/2 and HTTP/3?

  • To improve latency, reduce page load times, and utilize modern internet features like multiplexing, better compression, and faster handshake.

๐Ÿ”ธ HTTP/1.1 Limitations (Why Upgrade?)

  • Head-of-line (HOL) blocking: One slow resource blocks others.
  • Multiple TCP connections needed โ†’ overhead.
  • No compression of headers.
  • High latency in handshake and transfer.

โœ… HTTP/2 Features

1. Multiplexing

  • Multiple streams (requests/responses) over a single TCP connection.
  • No need for multiple TCP connections.
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Browser    โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ req1       โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ req2       โ”‚โ”€โ”€โ”€โ”€โ”€โ–บโ”‚
โ”‚ req3       โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ†“
     One TCP connection
Enter fullscreen mode Exit fullscreen mode

2. Binary Framing

  • All messages (headers, data) are encoded in binary format instead of plain text โ†’ faster and more compact.

3. **Header Compression (HPACK)

**

  • HTTP headers are compressed to save bandwidth.

4. **Server Push (Optional)

  • Server can "push" resources (CSS/JS/fonts) before the client even asks.
  • Useful in predictable page loads.

โ†’ Client: GET /index.html
โ† Server: /index.html + /style.css + /app.js (pushed without asking)

HTTP/3: What Changed Again?

โœ… Uses QUIC protocol instead of TCP

QUIC = Quick UDP Internet Connections (built by Google)
Why QUIC?

TCP has these problems:

Slow connection setup (3-way handshake).

Head-of-Line blocking at the TCP level.

Connection loss resets everything.
Enter fullscreen mode Exit fullscreen mode


๐Ÿง  Web Server vs Application Server - Deep Dive


๐Ÿ–ฅ๏ธ 1. What is a Web Server?

๐Ÿ”ง Primary Role:

A web server handles static content such as:

  • HTML
  • CSS
  • JavaScript
  • Images (JPG, PNG, etc.)

It serves files directly from disk to the client browser.

๐Ÿ’ก Think of a Web Server like a waiter โ€” it brings pre-cooked food (static files) to your table.


โš™๏ธ Features of Web Server

Feature Description
Static File Serving Serves .html, .css, .js, images directly from file system.
SSL/TLS Termination Handles HTTPS encryption/decryption (SSL certificates).
Caching Stores frequently requested files in memory to improve speed.
Load Balancing Distributes incoming requests across multiple App Servers.

๐ŸŒ Popular Web Servers

  • Apache HTTPD (older but reliable)
  • Nginx (very fast, efficient)
  • Caddy (auto HTTPS with Let's Encrypt)

๐Ÿญ 2. What is an Application Server?

๐Ÿ”ง Primary Role:

An Application Server handles dynamic content. It:

  • Executes backend code
  • Fetches data from databases
  • Performs business logic

๐Ÿ’ก Think of an Application Server as a chef โ€” it cooks fresh food (generates dynamic content) based on your order (request).


โš™๏ธ Features of App Server

Feature Description
Code Execution Runs backend code (e.g. Express, Django, Spring Boot)
DB Connectivity Connects to databases like MySQL, MongoDB, PostgreSQL
Session Management Maintains user session, login state, etc.
Transactions Ensures atomic DB operations (commit or rollback)

๐Ÿ’ก Common Examples

Language Application Servers
Node.js Express.js, NestJS
Java Tomcat, Jetty, WildFly
Python Django, Flask, FastAPI
PHP Laravel, Symfony

๐Ÿ”„ 3. How They Work Together

Client (Browser / Mobile App)
โฌ‡๏ธ
Web Server (Nginx / Apache)
โฌ‡๏ธ
Static Route? โžก๏ธ Serve static file directly
โฌ‡๏ธ
Dynamic Route? โžก๏ธ Forward to App Server
โฌ‡๏ธ
App Server (Express / Django)
โฌ‡๏ธ
DB, Business Logic Execution
โฌ‡๏ธ
Response sent back via Web Server
โฌ‡๏ธ
Client receives result



Why do we separate static and dynamic content handling?

Performance: Static files (e.g., images, JS) can be cached and served quickly by a web server like Nginx.

Scalability: Separating allows static content to be offloaded from the heavier app server.

Security: Keeps the app logic isolated; static servers donโ€™t need access to databases or internal logic.

Simplicity: Web servers are optimized for speed and concurrency, while app servers are optimized for logic and computation.
Enter fullscreen mode Exit fullscreen mode
  1. Can a single server act as both web and application server?

โœ… Yes, especially in small-scale setups.

Node.js Express, Django, and Spring Boot can serve both static and dynamic content.

However, in production, itโ€™s a best practice to separate them:

    Nginx (web server) handles routing, SSL, compression.

    App server handles dynamic requests.
Enter fullscreen mode Exit fullscreen mode

โš™๏ธ Technical

  1. How does Nginx improve performance with caching and load balancing?

    Caching:

    Stores frequent responses (e.g., HTML pages, JSON APIs) in memory.
    
    Reduces load on backend app servers and databases.
    

    Load Balancing:

    Distributes incoming traffic across multiple app servers.
    
    Methods: Round Robin, Least Connections, IP Hash.
    
    Ensures high availability and scalability.
    

    Extra features:

    Connection pooling
    
    GZIP compression
    
    SSL offloading
    
  2. What happens when an HTTPS request reaches Nginx?

    TLS Handshake:

    Nginx decrypts the request using the SSL certificate.
    
    Ensures data confidentiality and authenticity.
    

    Routing:

    Nginx uses server_name and location blocks to match the request.
    

    Proxying (if configured):

    Passes the decrypted request to a backend app server over HTTP (or internal HTTPS).
    

    Response:

    Nginx sends the encrypted response back to the client.
    

โœ… You can also use Nginx as a reverse proxy + SSL terminator.



โ›“๏ธ What Is a Presigned URL?

A presigned URL is a special type of temporary, secure link that allows someone to access a specific resource โ€” like a file in cloud storage โ€” without logging in or having permanent credentials.

It gives permission to perform actions like:

  • ๐Ÿ”ผ Uploading a file
  • ๐Ÿ”ฝ Downloading a file
  • โŒ Deleting a file

... for a limited time.

This is especially useful when you:

  • Want users to upload or download files without giving them full access to your server or cloud.
  • Need secure sharing without managing login systems or API keys.

๐Ÿ› ๏ธ How It Works (Behind the Scenes)

Letโ€™s break down the upload process using a YouTube-like example:

โœ… Step 1: Client Requests a Presigned URL

When a user wants to upload a video, the client (e.g., browser or mobile app) sends a request to YouTubeโ€™s backend asking for a presigned URL.

โœ… Step 2: Server Generates Presigned URL

The backend (YouTube server) generates a secure, short-lived URL using:

  • The file path (Key)
  • HTTP method (PUT for upload)
  • Expiry time (e.g., 5 minutes)
  • A cryptographic signature created using AWS credentials

โœ… Step 3: URL Is Sent to Client

The server returns the presigned URL to the userโ€™s device.

โœ… Step 4: Client Uploads File Directly to Cloud

The client uploads the video directly to S3 using the URL, bypassing the application server entirely.

โœ… Step 5: S3 Validates & Stores the File

S3 checks the URLโ€™s validity:

  • Is the signature correct?
  • Has the URL expired?

If valid, the upload is accepted and stored. The backend can then be notified to process or catalog the file.


โš™๏ธ Whatโ€™s Inside a Presigned URL?

A presigned URL contains:

  • The target resource (bucket + file path)
  • The action allowed (PUT, GET, DELETE)
  • Expiry timestamp
  • A secure signature (HMAC with access key)

This ensures that only authorized, time-bound operations are allowed.


๐Ÿš€ Why Use Presigned URLs Instead of Traditional Uploads?

Traditional Upload Presigned URL
File flows through backend File uploads directly to cloud
Backend must handle large files Backend just creates the URL
Slower and expensive Fast and scalable
Higher server load Offloaded to cloud (e.g., S3)
Exposes infrastructure to risks Link auto-expires, more secure

โœ… Presigned URLs are:

  • ๐Ÿš€ Faster
  • ๐Ÿ’ฐ Cheaper
  • ๐Ÿ” More secure
  • ๐ŸŒ Easier to scale

๐ŸŒ AJAX โ€“ Asynchronous JavaScript and XML

โœ… What is AJAX?

AJAX is a technique used in web development to send and receive data from a server asynchronously without reloading the entire web page.

๐Ÿ” AJAX allows partial page updates, making web apps fast and interactive.


๐Ÿง  Full Form:

Asynchronous

JavaScript

And

XML (Originally XML, now mostly JSON is used)


๐Ÿ“ฑ Real-World Example:

Google Search Suggestions:

When you type in Googleโ€™s search bar, suggestions appear immediately without reloading the page. This is powered by AJAX.


โš™๏ธ Technologies Involved:

Technology Role
HTML/CSS Structure & Styling
JavaScript Logic and Events
XMLHttpRequest / fetch() Send/receive data to/from server
JSON/XML Data format used for communication
DOM To update the web page dynamically

๐Ÿ” How AJAX Works (Step-by-Step):

  1. User interacts with the web page (e.g., clicks a button).
  2. JavaScript sends a request to the server (in background).
  3. Server processes the request and sends data back.
  4. JavaScript receives the data and updates the web page (without reload).

๐Ÿ“ฆ Example Code (Using fetch API):

// Send AJAX request to server
fetch('/api/user')
  .then(response => response.json())
  .then(data => {
    // Update page dynamically
    document.getElementById('username').innerText = data.name;
  });
Enter fullscreen mode Exit fullscreen mode


Database Partitioning vs Sharding

๐Ÿ” Introduction

As data grows exponentially in modern systems, managing and querying large datasets efficiently becomes critical. Two common approaches to handle large-scale databases are:

  • Partitioning: Dividing data within a single database.
  • Sharding: Distributing data across multiple databases or servers.

Both techniques improve performance, scalability, and maintainability, but they serve different purposes and operate at different levels of system architecture.


1๏ธโƒฃ What is Partitioning?

โœ… Definition:

Partitioning is the process of dividing a single large table or index into smaller, manageable pieces called partitions.

These partitions are still part of the same logical table and are managed by the same database engine.

๐Ÿ”ง Types of Partitioning:

Type Description Use Case
Range Data split by value range in a column Time-based data (logs, sales)
List Data split by discrete column values Country/region/user-type
Hash Data distributed via a hash function Even load distribution
Composite Combines two types (e.g., Range + Hash) Multi-dimensional datasets

๐Ÿงฑ Horizontal vs Vertical Partitioning:

Type Description Use Case
Horizontal Split rows across partitions Logs, user records, transactions
Vertical Split columns across tables Sensitive vs non-sensitive data

โœ… Benefits

  • Faster queries (due to partition pruning)
  • Easier maintenance (backup/drop/archive)
  • Scalability within a single database

โš ๏ธ Drawbacks

  • Added schema complexity
  • Not all DBs support all partition types
  • Uneven data can cause data skew

2๏ธโƒฃ What is Sharding?

โœ… Definition:

Sharding is the process of splitting a dataset across multiple physical databases or servers, each called a shard.

Each shard holds a subset of the entire data and can be queried independently.

๐Ÿ”ง Types of Sharding:

Type Description Use Case
Horizontal Different rows in each shard Large user base split by user_id
Vertical Different tables or services per shard Microservices with separate schemas
Geo-Sharding Based on geography or region Global apps (e.g., Asia, EU users)

๐Ÿงฑ Example:

Shard Data Range
Shard 1 user_id 1โ€“10 million
Shard 2 user_id 10Mโ€“20 million
Shard 3 user_id 20Mโ€“30 million

๐Ÿ›  Tools That Support Sharding:

  • MongoDB (built-in)
  • Vitess (MySQL)
  • Citus (PostgreSQL)
  • Cassandra (sharded by design)
  • ElasticSearch (auto-sharding)

โœ… Benefits

  • True horizontal scaling
  • Improved availability & fault isolation
  • Handles very large datasets across regions

โš ๏ธ Drawbacks

  • Complex to implement and maintain
  • Cross-shard joins are difficult
  • Requires careful shard key design
  • Complex backup & consistency management

๐Ÿ” Partitioning vs Sharding: Comparison Table

Feature Partitioning Sharding
Scope Inside one database Across multiple databases/servers
Managed By Database Engine Application or Shard Middleware
Logical Unit Table partition Database/shard
Cross-Partition Joins Supported Difficult or unsupported
Scalability Limited to DB machine Horizontally scalable
Use Case Structured, large tables Global-scale systems (Facebook, etc.)

๐Ÿ“Œ Summary

  • Partitioning is suitable for scaling within a single database and improving query performance for large tables.
  • Sharding is ideal for massive-scale, distributed systems that require true horizontal scaling and fault tolerance.

Use the right strategy based on your system's architecture, data volume, and scalability requirements.





๐Ÿงญ Difference Between Observability and Monitoring

Aspect Monitoring Observability
๐Ÿ” Definition Collecting predefined metrics to track system health Understanding internal state of a system by analyzing outputs
๐ŸŽฏ Goal Detect known issues and alert when something breaks Investigate and diagnose unknown or complex issues
๐Ÿ”ง Approach Reactive โ€“ predefined checks and dashboards Proactive โ€“ enables asking new questions and exploring behavior
๐Ÿ”ฌ Focus Known problems Unknown unknowns
๐Ÿงฑ Components Metrics, alerts, dashboards Metrics + Logs + Traces (3 Pillars of Observability)
๐Ÿ“Š Tools Prometheus, Nagios, Zabbix OpenTelemetry, Grafana, Jaeger, Honeycomb
๐Ÿšจ Use case Alert when CPU > 90% Understand why latency is increasing randomly
๐Ÿ’ก Analogy Thermometer shows temperature (monitoring) Doctor uses symptoms + scan + history to diagnose (observability)

๐Ÿ“ฆ Example

Monitoring:

  • You set a rule: โ€œAlert me if memory usage goes above 90%โ€.
  • You get notified when it does.

Observability:

  • Your app slows down.
  • You don't know why.
  • You dive into metrics, traces, logs โ€“ see a DB call is slow due to network latency.
  • You find a misconfigured load balancer in a specific region.

โœ… Key Takeaway:

Monitoring is a subset of Observability.

Observability is about having enough data and tooling to answer any question about your system, even if you didnโ€™t anticipate the issue in advance.



๐Ÿ“ก What is OpenTelemetry?

OpenTelemetry is a vendor-neutral, open-source observability framework by the CNCF that provides standardized tools to collect, process, and export telemetry data โ€” specifically metrics, logs, and traces โ€” from applications and infrastructure.

It consists of:

  • SDKs for instrumentation, and
  • A collector component that receives telemetry data, processes it (like batching or sampling), and exports it to observability backends like New Relic, Prometheus, Jaeger, or any OTLP-compatible platform.

๐Ÿ’ช Why OpenTelemetry is Powerful

What makes OpenTelemetry powerful is that it decouples telemetry generation from storage or visualization.

You write once using OTel SDKs and can export to any backend without being locked into a vendor.


๐Ÿงช Real-World Example

In my previous project at Janitri, I used OpenTelemetry SDKs in the backend to instrument REST APIs and used the OpenTelemetry Collector to forward metrics to Prometheus.

Logs and traces were optionally integrated via extensions.


๐Ÿ”„ In a New Relic Setup

This same SDK can send data directly to New Relic via the OTLP exporter, giving you full-stack visibility โ€” with no vendor-specific lock-in.


๐ŸŽฏ Conclusion

Thatโ€™s the beauty of OpenTelemetry:

  • Itโ€™s interoperable
  • Itโ€™s future-proof
  • It aligns deeply with New Relicโ€™s support for open standards


๐Ÿ“Š What is Prometheus?

Prometheus is an open-source, time-series database and monitoring system originally developed by SoundCloud and now part of the CNCF (Cloud Native Computing Foundation).

It is designed to collect and store metrics from systems and applications using a pull-based model.


โš™๏ธ How Prometheus Works

  • Prometheus scrapes data from exposed endpoints (typically /metrics).
  • It stores this data in its local time-series database (TSDB).
  • Querying is done using its powerful query language called PromQL.
  • It supports rule-based alerting using its built-in component called Alertmanager.

๐Ÿ“Œ Key Characteristics

Feature Description
๐Ÿ”„ Pull-Based Model Prometheus pulls metrics data from targets, instead of targets pushing data
๐Ÿ“ˆ Metric-Focused Only handles metrics (no support for logs or traces)
๐Ÿง  PromQL A flexible and powerful query language
๐Ÿšซ No Built-in Clustering Does not support native clustering or long-term storage out of the box
๐Ÿ”— Extensibility Can be extended using projects like Thanos or Cortex for high availability

๐Ÿ‘จโ€๐Ÿ’ป Real-World Example (Janitri Project)

In my project at Janitri, I used Prometheus alongside OpenTelemetry to collect real-time metrics related to API performance.

I visualized this data using Grafana, which gave immediate insights, although the setup required some effort and configuration.


๐Ÿค Why Prometheus with OpenTelemetry?

OpenTelemetry is a telemetry generation and export framework โ€” not a full observability stack.

It collects metrics, logs, and traces from applications using SDKs and exports them to a backend.

Prometheus is one such backend โ€” specialized in metrics.


๐Ÿ” Integration Flow

  1. I used OpenTelemetry SDKs to instrument my application.
  2. Then I used the OpenTelemetry Collector to expose metrics in Prometheus format via the /metrics receiver.
  3. Prometheus scraped this data, stored it, and allowed me to:
    • Query it using PromQL
    • Set up alerts via Alertmanager

๐Ÿ”— Conclusion

Prometheus completed what OpenTelemetry started โ€”

  • ๐Ÿ› ๏ธ OTel was the producer
  • ๐Ÿง  Prometheus was the consumer, storage, and query engine

This architecture was:

  • โœ… Modular
  • ๐Ÿ”„ Flexible
  • ๐Ÿ”ฎ Future-proof

If needed, I could easily swap Prometheus with any OTLP-compatible backend (e.g., New Relic) without changing instrumentation code.

Thatโ€™s the power of combining OpenTelemetry with open, pluggable tools like Prometheus.

This architecture was modular and future-proof. If needed, I could swap Prometheus with any other OTLP-compatible backend โ€” like New
Relic โ€” without changing instrumentation code, In New Relicโ€™s case, I can just add an OTLP exporter to forward all telemetry to New Relicโ€™s platform



๐Ÿงญ Full Observability Stack using OpenTelemetry

This architecture illustrates how telemetry flows from instrumented code all the way to dashboards using tools like OpenTelemetry, Prometheus, Loki, Jaeger, and Grafana.


1๏ธโƒฃ Instrumentation Layer (Your Code)

Add OpenTelemetry SDKs to generate telemetry (metrics, logs, traces).

You can use:

  • Auto-instrumentation agents

    (e.g. for Node.js, Python, Java)

  • Manual instrumentation

    (tracer.startSpan(), meter.record(), etc.)


2๏ธโƒฃ Collector Layer

The OpenTelemetry Collector is the heart of the pipeline:

  • Receives data via receivers
  • Processes data (optional) via processors
  • Sends data to exporters (e.g., Prometheus, Jaeger)

You can run the Collector as:

  • ๐ŸŸข Agent โ€“ runs locally on each host (lightweight)
  • ๐ŸŸฃ Gateway โ€“ centralized telemetry router (common in prod)

3๏ธโƒฃ Backend Layer

These are the specialized storage tools for each data type:

Data Type Tool Purpose
Metrics Prometheus Monitoring, alerting, dashboards
Logs Loki Log aggregation & searchable logs
Traces Jaeger/Tempo Distributed tracing & request flow

These tools store and index the telemetry so that Grafana (or New Relic) can query them.


4๏ธโƒฃ Visualization Layer (Grafana)

  • Grafana connects to:

    • Prometheus (for metrics)
    • Loki (for logs)
    • Jaeger/Tempo (for traces)
  • Unified dashboards for all observability pillars

  • Create alerts (e.g., CPU > 80%, error rate > 5%)

  • Supports full correlation:

    • Logs โ†’ Traces โ†’ Metrics from one screen

๐Ÿง  Key Interview Lines You Can Drop

  • โ€œThe OpenTelemetry Collector acts as a hub where all telemetry โ€” metrics, logs, traces โ€” is routed, transformed, and exported.โ€

  • โ€œGrafana sits on top as the visual UI, but the data lifeblood flows from instrumented apps through OpenTelemetry.โ€

  • โ€œIn a real production setup, this model gives me flexibility: swap out Prometheus with New Relic just by changing the exporter.โ€







Git Merge vs Rebase vs Squash - Complete Guide

The Problem

Tumhare paas ek feature branch hai jismein commits A, B, C hain. Main branch mein meanwhile commits D, E add ho gaye hain. Ab kya karna hai?

main:     1---2---D---E
               \
feature:        A---B---C
Enter fullscreen mode Exit fullscreen mode

Option 1: Git Merge ๐Ÿ”—

What happens:

git checkout main
git merge feature-branch
Enter fullscreen mode Exit fullscreen mode

Result:

main: 1---2---D---E---M
           \         /
feature:    A---B---C
Enter fullscreen mode Exit fullscreen mode

Simple Explanation:

  • Ek merge commit (M) create hota hai
  • Dono branches ka history preserve rehta hai
  • "Knot" jaisa structure banta hai

When to use:

  • Jab complete history chahiye
  • Team collaboration mein transparency chahiye
  • Feature branch ka detailed development track karna ho

Option 2: Git Rebase โ†—๏ธ

What happens:

git checkout feature-branch
git rebase main
Enter fullscreen mode Exit fullscreen mode

Result:

main: 1---2---D---E---A'---B'---C'
Enter fullscreen mode Exit fullscreen mode

Simple Explanation:

  • Feature branch commits ko main ke "tip" pe move kar deta hai
  • Clean, linear history milti hai
  • Original commits A,B,C become A',B',C' (new commit IDs)

When to use:

  • Clean, linear history chahiye
  • Complex merge conflicts avoid karne ke liye
  • Professional projects mein preferred

Option 3: Squash Commits ๐Ÿ—œ๏ธ

What happens:

git checkout main
git merge --squash feature-branch
git commit -m "Add complete feature X"
Enter fullscreen mode Exit fullscreen mode

Result:

main: 1---2---D---E---S
Enter fullscreen mode Exit fullscreen mode

Simple Explanation:

  • Saare feature commits (A+B+C) ko ek single commit (S) mein combine kar deta hai
  • Main branch mein sirf ek clean commit dikhta hai
  • Individual commits ka detail lose ho jaata hai main mein

When to use:

  • Main branch mein clean history chahiye
  • Feature development details main mein nahi chahiye
  • GitHub/GitLab mein popular approach

Real World Scenarios ๐ŸŒ

Scenario 1: Small Personal Project

Use: Simple merge

  • History complexity matter nahi karta
  • Quick and easy

Scenario 2: Professional Team Project

Use: Rebase + Fast-forward merge

  • Clean linear history
  • Easy to track changes
  • Professional appearance

Scenario 3: Open Source Project

Use: Squash commits

  • Main branch clean rehti hai
  • Contributors ka detailed work feature branch mein preserved
  • Easy to review and rollback

Commands Summary ๐Ÿ“

Merge:

git checkout main
git merge feature-branch
Enter fullscreen mode Exit fullscreen mode

Rebase:

git checkout feature-branch
git rebase main
git checkout main
git merge feature-branch  # Fast-forward merge
Enter fullscreen mode Exit fullscreen mode

Squash:

git checkout main
git merge --squash feature-branch
git commit -m "Descriptive message"
Enter fullscreen mode Exit fullscreen mode

Top comments (0)