DEV Community

Mritunjay Singh
Mritunjay Singh

Posted on • Edited on

Backend Interview Sheet

1. What is Docker, and Why is it Used?

Docker is an open-source containerization platform that allows developers to package applications and their dependencies into isolated environments called containers. These containers ensure that applications run consistently across different environments.

🔹 Real-Life Example:

Imagine you're developing a MERN stack web app. It works fine on your laptop, but when your teammate runs it, they get "version mismatch" errors.

With Docker, you create a consistent environment across all machines, preventing such issues.

✅ Why Use Docker?

Docker is beneficial when you need:

  • Portability → Works on any OS without compatibility issues
  • Consistency → Eliminates "It works on my machine" problems
  • Lightweight → Uses fewer system resources than virtual machines
  • Scalability → Quickly scale applications with minimal overhead

2. Main Components of Docker

🛠️ 1. Docker Daemon (dockerd)

  • The background process that manages Docker containers
  • Listens for API requests and handles images, networks, and volumes

💻 2. Docker CLI (Command-Line Interface)

  • A tool to interact with the Docker Daemon
  • Common commands:
  docker ps        # List running containers  
  docker run       # Start a new container  
  docker stop      # Stop a running container  
Enter fullscreen mode Exit fullscreen mode

📦 3. Docker Images

  • A read-only template containing the application, libraries, and dependencies
  • Immutable → Once built, images don’t change
  • Used to create containers

📌 4. Docker Containers

  • A running instance of a Docker image
  • Isolated from the host system but can interact if needed (e.g., exposing ports)

🌐 5. Docker Hub

  • A cloud-based registry where Docker images are stored and shared

🗂️ 6. Docker Volumes

  • Used for persistent data storage outside of containers

📌 Illustration of Docker Components:

Diagram showing Docker architecture with Daemon, CLI, Images, Containers, and Volumes


3. How is Docker Different from Virtual Machines?

⚡ Example:

You're testing a React.js + Express.js app. Instead of running a full Ubuntu VM (which consumes high RAM & CPU), you start a lightweight container in seconds:

docker run -d -p 3000:3000 node:16
Enter fullscreen mode Exit fullscreen mode

Unlike a VM, which takes minutes to boot, a container starts instantly.

🆚 Docker vs. Virtual Machines

Feature Docker (Containers) Virtual Machines (VMs)
Boot Time Seconds Minutes
Size MBs GBs
Performance Near-native speed Slower due to hypervisor overhead
Isolation Process-level isolation Full OS-level isolation
Resource Efficiency Shares OS kernel, lightweight Requires full OS, resource-intensive

docker run vs. docker start vs. docker exec

docker run : Start a new container
docker start : Restart a stopped container
docker exec : Run a command inside it


4. Popular and Useful Docker Commands

Here are some of the most commonly used Docker commands:

🔍 Container Management

# List all running containers
docker ps  

# List all containers (including stopped ones)
docker ps -a  

# Start a stopped container
docker start <container_id>  

# Stop a running container
docker stop <container_id>  

# Remove a container
docker rm <container_id>  
Enter fullscreen mode Exit fullscreen mode

🏗 Image Management

# List all available images
docker images  

# Pull an image from Docker Hub
docker pull <image_name>  

# Remove an image
docker rmi <image_name>  
Enter fullscreen mode Exit fullscreen mode

📦 Build and Run Containers

# Build a Docker image from a Dockerfile
docker build -t <image_name> .  

# Run a container from an image
docker run -d -p 8080:80 <image_name>  
Enter fullscreen mode Exit fullscreen mode

📂 Volume Management

# List all Docker volumes
docker volume ls  

# Create a new volume
docker volume create <volume_name>  

# Remove a volume
docker volume rm <volume_name>  
Enter fullscreen mode Exit fullscreen mode

Docker Compose: docker-compose.yml

What is docker-compose.yml?

The docker-compose.yml file is used to define and run multi-container Docker applications. With Docker Compose, you can manage and orchestrate multiple services, including databases, backend APIs, and front-end applications, all in a single file.

It allows you to define services, networks, and volumes, making it easier to deploy and manage applications that require multiple services working together.


Why is docker-compose.yml Useful?

  1. Simplifies Multi-Container Management:
    Instead of managing each container manually, Docker Compose allows you to define all services (frontend, backend, database, etc.) in one configuration file and launch them with a single command.

  2. Networking and Dependency Management:
    Docker Compose automatically creates a network for your containers, allowing them to communicate with each other. Services can be referenced by their service name, which means the backend can talk to the database without needing an IP address.

  3. One Command to Start Everything:
    Instead of running individual containers with complex docker run commands, Docker Compose lets you define the services and their dependencies in a YAML file, and run everything with docker-compose up.

  4. Simplified Development Environment:
    With Docker Compose, developers can easily replicate the production environment locally, using the same configuration for services like databases, backends, and frontends. It allows seamless integration and testing, as you don't have to manually set up each service.

  5. Environment Variable Management:
    You can manage environment variables for each service within the docker-compose.yml file, making it easier to configure your application for different environments (development, testing, production).


Example of docker-compose.yml for a Web Application

Let’s walk through an example where we have three services:

  • Frontend: A React app running on port 3000.
  • Backend: A Node.js API running on port 5000.
  • Database: A MongoDB instance to store data.
version: '3.8'

services:
  frontend:
    build: ./frontend
    ports:
      - "3000:3000"
    volumes:
      - ./frontend:/app
    depends_on:
      - backend

  backend:
    build: ./backend
    ports:
      - "5000:5000"
    environment:
      - NODE_ENV=development
    depends_on:
      - database

  database:
    image: mongo
    volumes:
      - mongo-data:/data/db
    ports:
      - "27017:27017"

volumes:
  mongo-data:
Enter fullscreen mode Exit fullscreen mode

Database Migrations

  1. Explain how you would design and manage a database schema using Sequelize, including the process of setting up migrations, handling model relationships, optimizing for performance, and managing database changes in a collaborative team environment.

Database Migration with Sequelize

Purpose

Database migrations allow you to safely update and manage your database schema over time. They help track changes to the schema in a version-controlled manner, making it easy to collaborate in teams.

Setting Up Migrations

  1. Initialize Sequelize with sequelize-cli to generate migration files.
  2. Migration files contain two primary methods:
    • up: For applying changes (e.g., create tables, add columns).
    • down: For rolling back changes (undoing the applied changes).

Handling Schema Changes

  • Creating Migrations:
    When you need to add, modify, or delete database schema (e.g., tables, columns), you create a new migration file.

  • Applying Migrations:
    Use the command npx sequelize-cli db:migrate to apply migrations to the database.

  • Rolling Back Migrations:
    Use npx sequelize-cli db:migrate:undo to undo the last applied migration.

Model Relationships

  • Define associations (e.g., one-to-many, many-to-many) within your models using Sequelize methods:
    • hasMany, belongsTo, manyToMany, etc.

Collaborative Workflow

  1. Migrations should be version-controlled using Git.
  2. Each team member works with migrations, and when schema changes are required, new migrations are created and applied across all environments (development, staging, production).


Github Action

Reference

YouTube Video

GitHub Actions Workflow Diagram

Steps to Deploy on AWS EC2

1. Launch EC2 Instance

2. Add Secret Variables in GitHub

  • Go to GitHub Repo SettingsSecrets and VariablesActionsAdd Secret

3. Connect to EC2 Instance

Install Docker on AWS EC2
sudo apt-get update
sudo apt-get install docker.io -y
sudo systemctl start docker
sudo chmod 666 /var/run/docker.sock
sudo systemctl enable docker
docker --version
docker ps
Enter fullscreen mode Exit fullscreen mode

4. Create Two Runners on the Same EC2 Instance

  • In React AppActionsRunnerNew Self-Hosted Runner
  • Copy the download commands and run them in the EC2 instance terminal
  • Install it as a service to keep it running in the background
sudo ./svc.sh install
sudo ./svc.sh start
Enter fullscreen mode Exit fullscreen mode
  • Do the same for the Node.js Runner

5. Create a Dockerfile for Node.js (Backend)

6. Create a GitHub Actions Workflow

Create a .github/workflows/cicd.yml file

GitHub Actions Workflow Code Example

Docker Deployment Workflow Diagram

7. Push Docker Images to DockerHub

8. Add Inbound/Outbound Rules on EC2 Instance

9. Access the Node.js Application

  • Use EC2_PUBLIC_IP:PORT to access your application

Deploying React App

  • Create a Dockerfile for React
  • Follow the same process as above

What is GitHub Actions, and how does it work?

GitHub Actions is a CI/CD automation tool that allows you to define workflows in YAML to build, test, and deploy applications directly from GitHub repositories.

How do you trigger a GitHub Actions workflow?

Workflows can be triggered by events such as push, pull_request, schedule, workflow_dispatch, and repository_dispatch.

What are the key components of a GitHub Actions workflow?

Key components include:

  • Workflows (YAML files in .github/workflows/)
  • Jobs (Independent execution units in a workflow)
  • Steps (Commands executed in a job)
  • Actions (Reusable units of functionality)
  • Runners (Machines that execute jobs)

What is the difference between jobs, steps, and actions?

  • Jobs: Run in parallel or sequentially within a workflow.
  • Steps: Individual tasks executed within a job.
  • Actions: Pre-built reusable components within steps.

How do you use environment variables and secrets in GitHub Actions?

  • Define environment variables using env:
  env:
    NODE_ENV: production
Enter fullscreen mode Exit fullscreen mode
  • Store sensitive values in secrets:
  env:
    API_KEY: ${{ secrets.API_KEY }}
Enter fullscreen mode Exit fullscreen mode

What are self-hosted runners, and when should you use them?

Self-hosted runners are custom machines used to execute workflows instead of GitHub's hosted runners. Use them for private repositories, custom hardware, or specific dependencies.

How do you cache dependencies in GitHub Actions?

Use actions/cache@v3 to cache dependencies and speed up builds:

- uses: actions/cache@v3
  with:
    path: ~/.npm
    key: npm-${{ runner.os }}-${{ hashFiles('**/package-lock.json') }}
    restore-keys: npm-${{ runner.os }}
Enter fullscreen mode Exit fullscreen mode

How do you create a reusable workflow in GitHub Actions?

Define a workflow with on: workflow_call and call it from another workflow:

on: workflow_call
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - run: echo "Reusable workflow"
Enter fullscreen mode Exit fullscreen mode

How do you set up a CI/CD pipeline using GitHub Actions?

Define a workflow that includes jobs for building, testing, and deploying:

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - run: echo "Building..."
  test:
    runs-on: ubuntu-latest
    steps:
      - run: echo "Testing..."
  deploy:
    runs-on: ubuntu-latest
    needs: test
    steps:
      - run: echo "Deploying..."
Enter fullscreen mode Exit fullscreen mode

What is the difference between workflow_dispatch, workflow_run, and schedule triggers?

  • workflow_dispatch: Manual trigger via GitHub UI/API.
  • workflow_run: Triggered when another workflow finishes.
  • schedule: Runs workflows at specific times using cron syntax.

How do you debug a failing GitHub Actions workflow?

  • Check logs in GitHub Actions UI.
  • Use set -x in bash scripts for verbose output.
  • Add continue-on-error: true to isolate issues.

How do you run a GitHub Actions workflow locally?

Use act, a tool that simulates GitHub Actions on your local machine:

act
Enter fullscreen mode Exit fullscreen mode

How do you optimize and speed up GitHub Actions workflows?

  • Use caching (actions/cache@v3).
  • Run jobs in parallel when possible.
  • Use matrix builds for different environments.
  • Limit workflow execution to necessary branches.

How do you manage permissions and security in GitHub Actions?

  • Use least privilege principle for tokens (GITHUB_TOKEN).
  • Restrict secrets exposure to trusted workflows.
  • Use branch protection rules to limit workflow execution.


Websockets & Multi-backend system

Why Do Backends Need to Talk to Each Other?

why

In a typical client-server architecture, communication happens between the browser (client) and the backend server. However, as applications grow, keeping everything on a single server exposed to the internet becomes inefficient and unscalable.

When designing a multi-backend system, you need to consider:

  • If there are multiple services, how should they communicate when an event occurs?
  • Should it be an immediate HTTP call?
  • Should the event be sent to a queue?
  • Should the services communicate via WebSockets?
  • Should you use a Pub-Sub mechanism?

These decisions impact performance, scalability, and reliability.

Multi-Backend Communication - Final Interview Script

Question: "How do you handle communication between multiple backend services?"


Your Answer:

"When designing multi-backend systems, we have four main communication patterns, each serving different use cases.

1. HTTP/REST - Synchronous Communication

This is direct API calls between services. For example, when a user places an order, the User Service calls Order Service, which then calls Payment Service immediately.

Use case: When you need immediate response and strong consistency, like user authentication or payment validation.

Pros: Simple to implement, immediate feedback, strong consistency
Cons: Tight coupling, if one service fails, whole chain breaks

2. Message Queues - Asynchronous 1:1

Here we use message brokers like RabbitMQ or Amazon SQS. Messages are placed in queues and consumers pick them up when ready. It's point-to-point communication - only one consumer gets each message.

Use case: Task distribution, background job processing, load balancing
Example: Multiple payment workers processing payment requests from a queue

Pros: Loose coupling, fault tolerance, load balancing
Cons: Eventual consistency, more complex error handling

3. Pub-Sub - Event Broadcasting 1:N

Publishers send events to topics, and multiple subscribers listen to the same topic. Same message goes to all subscribers.

Use case: Event-driven architecture where multiple services need to react to same event
Example: When order is created, Inventory Service updates stock, Email Service sends confirmation, Analytics tracks metrics - all from same event

Pros: Highly decoupled, easy to add new features, scalable
Cons: Message ordering challenges, duplicate handling needed

4. WebSockets - Real-time Communication

Persistent bidirectional connections for real-time communication.

Use case: Chat applications, live updates, gaming
Pros: Real-time, low latency
Cons: Resource intensive, connection management complexity


Key Difference - Queue vs Pub-Sub:

Both have same components - Publisher/Producer, Broker, and Consumer/Subscriber. The difference is in message delivery:

  • Queue: 1:1 - Messages compete, only one consumer gets each message
  • Pub-Sub: 1:N - Same message broadcasted to all subscribers

Real Example - E-commerce System:

I would use a hybrid approach:

  1. User places order - HTTP call for immediate validation
  2. Order processing - Pub-Sub event 'ORDER_CREATED' to notify multiple services
  3. Background tasks - Queue for heavy processing like report generation

Technology Stack:

  • Apache Kafka - Can work as both queue and pub-sub
  • RabbitMQ - For reliable message queuing
  • Redis Pub-Sub - For simple event broadcasting
  • Amazon SQS/SNS - For managed cloud solutions

Decision Framework:

Choose HTTP when: Need immediate response, strong consistency, simple flows
Choose Queues when: Task distribution, load balancing, background processing
Choose Pub-Sub when: Multiple services need same event, event-driven architecture
Choose WebSockets when: Real-time bidirectional communication needed


Production Considerations:

  • Error Handling: Circuit breakers, dead letter queues, retry mechanisms
  • Monitoring: Queue depths, processing times, error rates
  • Scalability: Horizontal scaling of consumers, proper partitioning

The key is choosing the right pattern for each specific use case rather than using one approach everywhere."


If Asked Follow-up Questions:

"What about data consistency?"

"For strong consistency, use HTTP calls. For eventual consistency, use async patterns with proper error handling and compensation transactions."

"How do you handle failures?"

"Circuit breakers for HTTP, dead letter queues for messages, retry mechanisms with exponential backoff, and proper monitoring."

"Which technology would you choose?"

"Kafka for high throughput and both queue/pub-sub needs, RabbitMQ for complex routing, SQS for simple cloud solutions."


Example: Payment Processing System

Payment

Let's consider a payment application. When a transaction occurs:

  1. The database update should happen immediately (synchronous).
  2. The notification (email/SMS) can be pushed to a queue (asynchronous).

Why not handle everything in the primary backend?

  • If the email service is down, should the user be forced to wait after completing the transaction? No!
  • Instead, we push the notification event to a queue.
  • Even if the notification service is down, the queue retains the event and sends notifications once the service is back.
  • This is why message queues (e.g., RabbitMQ, Kafka, AWS SQS) are better than HTTP for such tasks.

Types of Communication

  1. Synchronous Communication

    • The system waits for a response from the other system.
    • Examples: HTTP requests, WebSockets (in some cases).
  2. Asynchronous Communication

    • The system does not wait for a response.
    • Examples: Message queues, Pub-Sub services.

Why WebSockets?

WebSockets provide persistent, full-duplex communication over a single TCP handshake.

Limitations of HTTP:

  • In HTTP, the server cannot push events to the client on its own.
  • The client (browser) can request, and the server can respond, but the server cannot initiate communication with the client.

WebSockets vs. HTTP for Real-Time Applications

Example: Stock Market Trading System

  • Stock buying & selling generates millions of requests per second.
  • If you use HTTP, every request requires a three-way handshake, adding latency and overhead.
  • With WebSockets, the handshake happens only once, and then the server and client can continuously exchange data.

Alternative: Polling

If you still want to use HTTP for real-time updates, an alternative approach is polling.

  • However, polling creates unnecessary load on the server by making frequent requests.
  • WebSockets are a more efficient solution for real-time updates.


Some Basic Questions

Basic

What is Node.js?

Node.js is a runtime environment for executing JavaScript on the server side. It is not a framework or a language. A runtime is responsible for memory management and converting high-level code into machine code.

Examples:

  • Java: JVM (Runtime) → Spring (Framework)
  • Python: CPython (Runtime) → Django (Framework)
  • JavaScript: Node.js (Runtime) → Express.js (Framework)

With Node.js, JavaScript can run outside the browser as well.

Runtime vs Frameworks

  • Runtime: Focuses on executing code, handling memory, and managing I/O.
  • Framework: Provides structured tools and libraries to simplify development.

What happens when you enter a URL in the browser and hit enter?

DNS Lookup

The browser checks if it already knows the IP address for www.example.com.
If not, it contacts a DNS (Domain Name System) server to get the IP address (e.g., 192.168.1.1).
Enter fullscreen mode Exit fullscreen mode

Establishing Connection

The browser initiates a TCP connection with the web server using a process called three-way handshake.
If the website uses HTTPS, a TLS handshake happens to encrypt the communication.
Enter fullscreen mode Exit fullscreen mode

Sending HTTP Request

The browser sends an HTTP request to the server:

GET / HTTP/1.1
Host: www.example.com
Enter fullscreen mode Exit fullscreen mode

Server Processing

The web server processes the request and may:
    Fetch data from a database
    Generate a response (HTML, JSON, etc.)
Enter fullscreen mode Exit fullscreen mode

Receiving the Response

The server sends an HTTP response back to the browser:

HTTP/1.1 200 OK
Content-Type: text/html
Enter fullscreen mode Exit fullscreen mode

Rendering the Page

The browser processes the HTML, CSS, and JavaScript and displays the webpage.
Enter fullscreen mode Exit fullscreen mode

Difference Between Monolithic and Microservices Architecture

Monolithic Architecture

  • All components (UI, DB, Auth, etc.) are tightly coupled.
  • Single application handles everything.

Microservices Architecture

  • Divided into small, independent services.
  • Each service handles a specific function (Auth, Payments, etc.).

Pros:

  • Scalable
  • Services can use different tech stacks

Cons:

  • More complex to manage
  • Requires API communication

HTTP Status Codes

  • 200 OK
  • 201 Created
  • 400 Bad Request
  • 401 Unauthorized
  • 402 Payment Required
  • 404 Not Found
  • 405 Method Not Allowed
  • 500 Internal Server Error

What is cors ?

CORS stand for Cross Origin Resource Sharing- a security feature built into browsers
It blocks the requests from one origin(domain,protocol or port) to another origin unless explicitly allowed by the server
For exmple: Your frontend is hosted at frontend.com and you bacend at backend.com
The browser these as a different origin and blocks the request unless it is explicitly allowed
why does this happen though?
CORS error are triggered by Same Origin Policy,which prevents malicious websites from making unauthorized API call using your credentials

Browser isn't blocking the requests---its blocking the response for security reasons

REST vs GraphQL

REST API:

"REST (Representational State Transfer) is an architectural style where data is fetched using multiple endpoints, and each request returns a fixed structure of data."

GraphQL:

"GraphQL is a query language for APIs that allows clients to request only the data they need, reducing overfetching and underfetching."

💡 Key Point:

  • REST APIs have multiple endpoints (/users, /orders), while GraphQL has a single endpoint (/graphql).
  • GraphQL provides more flexibility by allowing clients to request exactly what they need in a single query.
  • REST APIs return predefined responses and sometimes require multiple requests.
  • If performance and flexibility are key concerns, GraphQL is a better choice.

How Do You Design an API for a Large-Scale System?

  • Use Microservices: Separate services (Auth, Payments, etc.).
  • Load Balancers: Distribute traffic efficiently.
  • Caching: Use Redis for frequently accessed data.
  • Pagination: Send data in chunks.
  • Rate Limiting: Prevent API abuse.

What is Pagination? How to Implement It?

Pagination breaks large datasets into smaller parts.
Implementation:

  • Use limit and offset in database queries.
  • Example:
  SELECT * FROM users LIMIT 10 OFFSET 20;
Enter fullscreen mode Exit fullscreen mode
  • Use cursor-based pagination for better performance.

How Do You Handle File Uploads?

  • Single file upload: Use multipart/form-data with Express.js & Multer.
  • Large file handling: Use chunked uploads.
  • Storage options: Store files on AWS S3, Google Cloud Storage, or a database.
  • Server-side Upload: The file is uploaded to your backend server first, and then the server sends it to S3 or Cloudinary.

JWT - Final Interview Answer Script

Question: "What is JWT? How does it work?"


Your Complete Answer:

"JWT stands for JSON Web Token. It's a stateless authentication mechanism where user information is encoded in a token that can be verified without storing session data on the server.

JWT Structure - 3 Parts:

JWT has three parts separated by dots:
header.payload.signature

1. Header: Contains metadata about the token

{
  "alg": "HS256",    // Algorithm used
  "typ": "JWT"       // Token type
}
Enter fullscreen mode Exit fullscreen mode

2. Payload: Contains user information and claims

{
  "userId": 123,
  "role": "admin",
  "exp": 1640995200    // Expiry timestamp
}
Enter fullscreen mode Exit fullscreen mode

3. Signature: Ensures token integrity and authenticity

  • Created by encrypting header + payload with a secret key
  • Used to verify token hasn't been tampered with

How JWT Authentication Works:

Step 1 - User Login:

  • User sends credentials to server
  • Server validates credentials
  • If valid, server creates JWT token

Step 2 - Token Creation:

  • Server creates header and payload
  • Server generates signature using secret key: HMAC-SHA256(header.payload, secretKey)
  • All three parts are combined: header.payload.signature

Step 3 - Token Usage:

  • Server sends token to client
  • Client stores token (localStorage or cookie)
  • Client sends token in Authorization header for API requests

Step 4 - Token Verification:

  • Server receives token with request
  • Server splits token into three parts
  • Server recreates signature using same secret key
  • If signatures match, token is valid
  • Server extracts user info from payload

Key Benefits:

Stateless: No need to store session data on server
Scalable: Works across multiple servers
Self-contained: All user info is in the token
Cross-domain: Can work across different domains

Security Considerations:

Secret Key: Never expose the secret key used for signing
Expiry: Always set short expiry times (15-30 minutes)
HTTPS: Always use HTTPS to prevent token interception
Storage: Be careful about XSS if storing in localStorage

Real-world Example:

When user logs into an e-commerce site:

  1. User enters username/password
  2. Server validates and creates JWT with user ID, role, expiry
  3. Client stores JWT and sends it with every API call
  4. Server verifies JWT and processes request
  5. When token expires, user needs to login again or refresh token

JWT vs Sessions:

JWT:

  • Stateless (no server storage)
  • Better for APIs and microservices
  • Self-contained

Sessions:

  • Stateful (server stores session data)
  • Better for traditional web apps
  • More secure (data on server)

The choice depends on your architecture - use JWT for REST APIs and distributed systems, sessions for traditional web applications."


If Asked Follow-up Questions:

"How do you handle token expiry?"

"Use refresh tokens. Short-lived access tokens (15 mins) with longer-lived refresh tokens (7 days). When access token expires, use refresh token to get new access token."

"What if someone steals the JWT?"

"That's why we use short expiry times, HTTPS only, and httpOnly cookies when possible. Also implement token blacklisting for logout."

"Can JWT be modified?"

"If someone modifies the payload, the signature won't match because they don't have the secret key. Server will reject the token."

"Where do you store JWT on client?"

"For web apps: httpOnly cookies for security, or localStorage for convenience but with XSS risk. For mobile: secure storage."


Question: "Explain Cookies, Sessions, Tokens, and Local Storage for authentication."


Your Answer:

"These are four different ways to handle user authentication and data storage. Let me explain each:

1. COOKIES - Automatic Browser Storage

What it is:
Cookies are small pieces of data that the server sends to the browser, and the browser automatically sends them back with every request.

How it works:

  • Server creates cookie and sends to browser
  • Browser stores it automatically
  • Browser includes cookie in every HTTP request to that domain
  • Server reads cookie data from request

Authentication use:

User logs in → Server creates cookie: authId=abc123 → Browser stores it → 
Every request includes: Cookie: authId=abc123 → Server validates cookie
Enter fullscreen mode Exit fullscreen mode

Example: When you login to Facebook, server sets cookie with session ID. Now every page you visit automatically sends this cookie.


2. SESSIONS - Server-Side Storage

What it is:
Session is user data stored on the server, identified by a session ID that's typically stored in a cookie.

How it works:

  • User logs in → Server creates session data in memory/database
  • Server generates unique session ID
  • Session ID is sent to browser via cookie
  • Browser sends session ID back with requests
  • Server looks up session data using this ID

Authentication flow:

Login → Server creates: sessions[abc123] = {userId: 456, role: 'admin'} →
Cookie: sessionId=abc123 → Server uses ID to fetch user data
Enter fullscreen mode Exit fullscreen mode

Example: Traditional web applications where user data is stored on server for security.


3. TOKENS (JWT) - Self-Contained Authentication

What it is:
A token is an encoded string containing user information that can be verified without storing anything on the server.

How JWT works:

  • Contains 3 parts: Header.Payload.Signature
  • Payload has user info (userId, role, expiry)
  • Signature ensures token hasn't been tampered with
  • Server can verify token without database lookup

Authentication flow:

Login → Server creates JWT token with user info → Client stores token →
Client sends: Authorization: Bearer <token> → Server verifies signature
Enter fullscreen mode Exit fullscreen mode

Example: REST APIs where each request includes JWT token in Authorization header.


4. LOCAL STORAGE - Browser Client Storage

What it is:
Browser's built-in storage that persists data locally, accessible via JavaScript.

How it works:

  • JavaScript can store/retrieve data: localStorage.setItem('token', 'abc123')
  • Data persists even after browser closes
  • Available to JavaScript on same domain
  • 5-10MB storage capacity

Authentication use:

Login → Store token: localStorage.setItem('authToken', token) →
API calls → Get token: localStorage.getItem('authToken') → 
Send manually: headers: {Authorization: Bearer + token}
Enter fullscreen mode Exit fullscreen mode

Example: Single Page Applications (SPAs) where JavaScript manages authentication.


Key Differences Summary:

Storage Location:

  • Cookies: Browser (managed automatically)
  • Sessions: Server-side (secure)
  • Tokens: Client-side (self-contained)
  • Local Storage: Browser (manual JavaScript)

Security:

  • Cookies: Can be HttpOnly (XSS safe), but CSRF risk
  • Sessions: Most secure (data on server)
  • Tokens: Stateless but vulnerable if stolen
  • Local Storage: Vulnerable to XSS attacks

Usage:

  • Cookies: Automatic with every request
  • Sessions: Server looks up data using session ID
  • Tokens: Manual inclusion in headers
  • Local Storage: Manual JavaScript handling

When to Use What:

Use Cookies + Sessions when:

  • Traditional web applications
  • Maximum security needed
  • Server-side rendering
  • Simple user flows

Use Tokens (JWT) when:

  • REST APIs
  • Mobile applications
  • Microservices architecture
  • Need stateless authentication

Use Local Storage when:

  • Single Page Applications (SPAs)
  • Need persistent client-side data
  • Want manual control over auth flow
  • Client-side JavaScript frameworks



Intermediate

What is full text search?

What is Serverless and Serverful backend ?

A serverfull backend means you manage the entire server, while a serverless backend means you don’t have to manage servers—your code runs only when needed on cloud platforms like AWS Lambda
Example: Imagine you are building a food delivery app like Zomato or Uber Eats.

If you use a serverfull backend:
    You set up an Express.js server on AWS EC2.
    The server is always running, handling all API requests like fetching restaurants, placing orders, and tracking deliveries.
    You pay for the server 24/7, even when there are no active users.

If you use a serverless backend:
    You use AWS Lambda functions to handle API requests.
    When a user places an order, the function runs only for that request and then shuts down.
    You only pay for execution time, making it cost-effective.
Enter fullscreen mode Exit fullscreen mode

Can you explain single-threaded vs. multi-threaded processing?

Single-threaded programs execute one task at a time, while multi-threaded programs can execute multiple tasks in parallel. However, single-threaded systems can still be asynchronous using event loops, like in Node.js. If I were building a CPU-intensive app like a video editor, I’d go with multi-threading. But for an API server handling multiple users, I’d use a single-threaded, asynchronous model like Node.js to handle requests efficiently

🧠 Web Server Request Handling – Full Interview Deep Dive

Understand how web servers handle various types of requests, what part of the system gets triggered, and why CPU, disk, and memory are used in different ways.


🔹 Case 1: Static File Request (e.g., GET /index.html)

🧱 Architecture:

Client → Web Server (Nginx, Apache) → Disk

Step Description CPU Used? Why
1 TCP Connection Establishment OS uses CPU threads to handle new socket connection
2 TLS Handshake (if HTTPS) ✅✅ Public-key crypto (RSA/ECC), key exchange – very CPU intensive
3 HTTP Request Parsing Server reads headers, URL, method
4 Check In-Memory Cache ⚠️ Sometimes If file is cached, skip disk I/O (saves time and CPU)
5 Disk I/O – Read File ⚠️ + I/O Slowest part if uncached (mechanical disk = even slower)
6 Build HTTP Response Add headers, content-type, status, etc.
7 Send Response (TCP Send) Network stack and syscalls involve CPU

✅ Conclusion:

  • Mostly I/O bound, but CPU handles parsing & networking
  • With HTTPS, CPU spikes due to encryption

🔹 Case 2: Dynamic Request (Backend involved)

e.g., GET /profile?id=10

🧱 Architecture:

Client → Web Server → Backend Server → DB

Step Description CPU Used? Why
1 TCP + TLS Handshake ✅✅ Same as static case
2 Request Parsing Headers, query params
3 Reverse Proxy to Backend Web server forwards via IPC/port
4 Backend App Logic ✅✅ Routing, auth, business logic (CPU heavy)
5 Database Query ⚠️ CPU + I/O Reads/writes involve disk and DB engine CPU
6 Response Generation (HTML/JSON) ✅✅ Templating or serialization is CPU-bound
7 Send Response → Client Network transmission

✅ Conclusion:

  • This is both CPU + I/O bound
  • More cores help in scaling
  • Backend does the heavy lifting, web server is just the router

🔹 Case 3: Cached Response

🧱 Architecture:

Client → Web Server → Cache (Redis/Memcached/internal) → Client

Step Description CPU Used? Why
1 TCP + HTTP Parsing Normal
2 Cache Lookup (Memory) ⚠️ Fast RAM lookup, nearly no disk or backend call
3 Response Ready → Send Minimal CPU for sending back

✅ Conclusion:

  • Fastest flow among all
  • Skips backend & disk I/O → highly efficient
  • Caching = performance booster

🔹 Case 4: Reverse Proxy (Static + Dynamic Mix)

🧱 Architecture:

Client → Nginx (Reverse Proxy) → Static OR Backend

Step Description CPU Used? Why
1 Request to Nginx Parses incoming request
2 Nginx Checks Routes Matches URI patterns
3 Serve Static (if matched) ⚠️ Disk read if not cached
4 Else Proxy to Backend Same as Case 2
5 Send Response Back Nginx acts as gateway

✅ Conclusion:

  • Nginx = Traffic Manager
  • Smart separation between static and dynamic content
  • Efficient request routing saves resources

🔹 Case 5: HTTPS (TLS) Request

Step Description CPU Used? Why
1 TCP Connection Basic connection setup
2 TLS Handshake ✅✅✅ Expensive: Cert validation, RSA/AES/ECC operations
3 HTTP Parsing After TLS tunnel established

✅ Conclusion:

  • TLS is CPU-heavy
  • TLS Offloading to Cloudflare or Load Balancer is often used

🔹 Case 6: API Request (POST JSON)

🧱 Architecture:

Client → Web Server/API Gateway → Backend → DB

Step Description CPU Used? Why
1 Receive POST TCP + header parsing
2 JSON Body Parsing ✅✅ Deserialization consumes CPU
3 Business Logic ✅✅ Auth, validation, core logic
4 DB Query ⚠️ DB fetch/update
5 Build JSON Response ✅✅ JSON.stringify() or equivalent
6 Send Response Network syscall

✅ Conclusion:

  • APIs (especially large JSON) are CPU-bound
  • Parsing/serializing JSON = CPU cycles
  • Use optimized libraries (like fast-json-stringify, etc.)

🔹 Case 7: File Upload / Download

🧱 Architecture:

Client → Web Server → Disk / Object Store (e.g., S3)

Step Description CPU Used? Why
1 TCP + Parse Start request
2 Read File Chunks (Upload) ✅ + I/O Buffered I/O reads
3 Write to Disk/S3 ⚠️ Disk or network-based I/O
4 Send Acknowledgement Final response

✅ Conclusion:

  • I/O-bound process, CPU handles chunking and buffering
  • Network & Disk performance matter a lot here


HTTP/2 and HTTP/3 Support in Web Servers


🔹 What is HTTP?

  • HTTP (HyperText Transfer Protocol) is an application-layer protocol used for communication between clients (like browsers) and web servers.
  • Versions: HTTP/1.1 → HTTP/2 → HTTP/3

🚀 Why HTTP/2 and HTTP/3?

  • To improve latency, reduce page load times, and utilize modern internet features like multiplexing, better compression, and faster handshake.

🔸 HTTP/1.1 Limitations (Why Upgrade?)

  • Head-of-line (HOL) blocking: One slow resource blocks others.
  • Multiple TCP connections needed → overhead.
  • No compression of headers.
  • High latency in handshake and transfer.

✅ HTTP/2 Features

1. Multiplexing

  • Multiple streams (requests/responses) over a single TCP connection.
  • No need for multiple TCP connections.
┌────────────┐
│ Browser    │
├────────────┤
│ req1       │──────┐
│ req2       │─────►│
│ req3       │──────┘
│            │
└────────────┘
         ↓
     One TCP connection
Enter fullscreen mode Exit fullscreen mode

2. Binary Framing

  • All messages (headers, data) are encoded in binary format instead of plain text → faster and more compact.

3. **Header Compression (HPACK)

**

  • HTTP headers are compressed to save bandwidth.

4. **Server Push (Optional)

  • Server can "push" resources (CSS/JS/fonts) before the client even asks.
  • Useful in predictable page loads.

→ Client: GET /index.html
← Server: /index.html + /style.css + /app.js (pushed without asking)

HTTP/3: What Changed Again?

✅ Uses QUIC protocol instead of TCP

QUIC = Quick UDP Internet Connections (built by Google)
Why QUIC?

TCP has these problems:

Slow connection setup (3-way handshake).

Head-of-Line blocking at the TCP level.

Connection loss resets everything.
Enter fullscreen mode Exit fullscreen mode


🧠 Web Server vs Application Server - Deep Dive


🖥️ 1. What is a Web Server?

🔧 Primary Role:

A web server handles static content such as:

  • HTML
  • CSS
  • JavaScript
  • Images (JPG, PNG, etc.)

It serves files directly from disk to the client browser.

💡 Think of a Web Server like a waiter — it brings pre-cooked food (static files) to your table.


⚙️ Features of Web Server

Feature Description
Static File Serving Serves .html, .css, .js, images directly from file system.
SSL/TLS Termination Handles HTTPS encryption/decryption (SSL certificates).
Caching Stores frequently requested files in memory to improve speed.
Load Balancing Distributes incoming requests across multiple App Servers.

🌐 Popular Web Servers

  • Apache HTTPD (older but reliable)
  • Nginx (very fast, efficient)
  • Caddy (auto HTTPS with Let's Encrypt)

🏭 2. What is an Application Server?

🔧 Primary Role:

An Application Server handles dynamic content. It:

  • Executes backend code
  • Fetches data from databases
  • Performs business logic

💡 Think of an Application Server as a chef — it cooks fresh food (generates dynamic content) based on your order (request).


⚙️ Features of App Server

Feature Description
Code Execution Runs backend code (e.g. Express, Django, Spring Boot)
DB Connectivity Connects to databases like MySQL, MongoDB, PostgreSQL
Session Management Maintains user session, login state, etc.
Transactions Ensures atomic DB operations (commit or rollback)

💡 Common Examples

Language Application Servers
Node.js Express.js, NestJS
Java Tomcat, Jetty, WildFly
Python Django, Flask, FastAPI
PHP Laravel, Symfony

🔄 3. How They Work Together

Client (Browser / Mobile App)
⬇️
Web Server (Nginx / Apache)
⬇️
Static Route? ➡️ Serve static file directly
⬇️
Dynamic Route? ➡️ Forward to App Server
⬇️
App Server (Express / Django)
⬇️
DB, Business Logic Execution
⬇️
Response sent back via Web Server
⬇️
Client receives result



Why do we separate static and dynamic content handling?

Performance: Static files (e.g., images, JS) can be cached and served quickly by a web server like Nginx.

Scalability: Separating allows static content to be offloaded from the heavier app server.

Security: Keeps the app logic isolated; static servers don’t need access to databases or internal logic.

Simplicity: Web servers are optimized for speed and concurrency, while app servers are optimized for logic and computation.
Enter fullscreen mode Exit fullscreen mode
  1. Can a single server act as both web and application server?

✅ Yes, especially in small-scale setups.

Node.js Express, Django, and Spring Boot can serve both static and dynamic content.

However, in production, it’s a best practice to separate them:

    Nginx (web server) handles routing, SSL, compression.

    App server handles dynamic requests.
Enter fullscreen mode Exit fullscreen mode

⚙️ Technical

  1. How does Nginx improve performance with caching and load balancing?

    Caching:

    Stores frequent responses (e.g., HTML pages, JSON APIs) in memory.
    
    Reduces load on backend app servers and databases.
    

    Load Balancing:

    Distributes incoming traffic across multiple app servers.
    
    Methods: Round Robin, Least Connections, IP Hash.
    
    Ensures high availability and scalability.
    

    Extra features:

    Connection pooling
    
    GZIP compression
    
    SSL offloading
    
  2. What happens when an HTTPS request reaches Nginx?

    TLS Handshake:

    Nginx decrypts the request using the SSL certificate.
    
    Ensures data confidentiality and authenticity.
    

    Routing:

    Nginx uses server_name and location blocks to match the request.
    

    Proxying (if configured):

    Passes the decrypted request to a backend app server over HTTP (or internal HTTPS).
    

    Response:

    Nginx sends the encrypted response back to the client.
    

✅ You can also use Nginx as a reverse proxy + SSL terminator.



⛓️ What Is a Presigned URL?

A presigned URL is a special type of temporary, secure link that allows someone to access a specific resource — like a file in cloud storage — without logging in or having permanent credentials.

It gives permission to perform actions like:

  • 🔼 Uploading a file
  • 🔽 Downloading a file
  • ❌ Deleting a file

... for a limited time.

This is especially useful when you:

  • Want users to upload or download files without giving them full access to your server or cloud.
  • Need secure sharing without managing login systems or API keys.

🛠️ How It Works (Behind the Scenes)

Let’s break down the upload process using a YouTube-like example:

✅ Step 1: Client Requests a Presigned URL

When a user wants to upload a video, the client (e.g., browser or mobile app) sends a request to YouTube’s backend asking for a presigned URL.

✅ Step 2: Server Generates Presigned URL

The backend (YouTube server) generates a secure, short-lived URL using:

  • The file path (Key)
  • HTTP method (PUT for upload)
  • Expiry time (e.g., 5 minutes)
  • A cryptographic signature created using AWS credentials

✅ Step 3: URL Is Sent to Client

The server returns the presigned URL to the user’s device.

✅ Step 4: Client Uploads File Directly to Cloud

The client uploads the video directly to S3 using the URL, bypassing the application server entirely.

✅ Step 5: S3 Validates & Stores the File

S3 checks the URL’s validity:

  • Is the signature correct?
  • Has the URL expired?

If valid, the upload is accepted and stored. The backend can then be notified to process or catalog the file.


⚙️ What’s Inside a Presigned URL?

A presigned URL contains:

  • The target resource (bucket + file path)
  • The action allowed (PUT, GET, DELETE)
  • Expiry timestamp
  • A secure signature (HMAC with access key)

This ensures that only authorized, time-bound operations are allowed.


🚀 Why Use Presigned URLs Instead of Traditional Uploads?

Traditional Upload Presigned URL
File flows through backend File uploads directly to cloud
Backend must handle large files Backend just creates the URL
Slower and expensive Fast and scalable
Higher server load Offloaded to cloud (e.g., S3)
Exposes infrastructure to risks Link auto-expires, more secure

✅ Presigned URLs are:

  • 🚀 Faster
  • 💰 Cheaper
  • 🔐 More secure
  • 🌐 Easier to scale

🌐 AJAX – Asynchronous JavaScript and XML

✅ What is AJAX?

AJAX is a technique used in web development to send and receive data from a server asynchronously without reloading the entire web page.

🔁 AJAX allows partial page updates, making web apps fast and interactive.


🧠 Full Form:

Asynchronous

JavaScript

And

XML (Originally XML, now mostly JSON is used)


📱 Real-World Example:

Google Search Suggestions:

When you type in Google’s search bar, suggestions appear immediately without reloading the page. This is powered by AJAX.


⚙️ Technologies Involved:

Technology Role
HTML/CSS Structure & Styling
JavaScript Logic and Events
XMLHttpRequest / fetch() Send/receive data to/from server
JSON/XML Data format used for communication
DOM To update the web page dynamically

🔁 How AJAX Works (Step-by-Step):

  1. User interacts with the web page (e.g., clicks a button).
  2. JavaScript sends a request to the server (in background).
  3. Server processes the request and sends data back.
  4. JavaScript receives the data and updates the web page (without reload).

📦 Example Code (Using fetch API):

// Send AJAX request to server
fetch('/api/user')
  .then(response => response.json())
  .then(data => {
    // Update page dynamically
    document.getElementById('username').innerText = data.name;
  });
Enter fullscreen mode Exit fullscreen mode


Database Partitioning vs Sharding

🔍 Introduction

As data grows exponentially in modern systems, managing and querying large datasets efficiently becomes critical. Two common approaches to handle large-scale databases are:

  • Partitioning: Dividing data within a single database.
  • Sharding: Distributing data across multiple databases or servers.

Both techniques improve performance, scalability, and maintainability, but they serve different purposes and operate at different levels of system architecture.


1️⃣ What is Partitioning?

✅ Definition:

Partitioning is the process of dividing a single large table or index into smaller, manageable pieces called partitions.

These partitions are still part of the same logical table and are managed by the same database engine.

🔧 Types of Partitioning:

Type Description Use Case
Range Data split by value range in a column Time-based data (logs, sales)
List Data split by discrete column values Country/region/user-type
Hash Data distributed via a hash function Even load distribution
Composite Combines two types (e.g., Range + Hash) Multi-dimensional datasets

🧱 Horizontal vs Vertical Partitioning:

Type Description Use Case
Horizontal Split rows across partitions Logs, user records, transactions
Vertical Split columns across tables Sensitive vs non-sensitive data

✅ Benefits

  • Faster queries (due to partition pruning)
  • Easier maintenance (backup/drop/archive)
  • Scalability within a single database

⚠️ Drawbacks

  • Added schema complexity
  • Not all DBs support all partition types
  • Uneven data can cause data skew

2️⃣ What is Sharding?

✅ Definition:

Sharding is the process of splitting a dataset across multiple physical databases or servers, each called a shard.

Each shard holds a subset of the entire data and can be queried independently.

🔧 Types of Sharding:

Type Description Use Case
Horizontal Different rows in each shard Large user base split by user_id
Vertical Different tables or services per shard Microservices with separate schemas
Geo-Sharding Based on geography or region Global apps (e.g., Asia, EU users)

🧱 Example:

Shard Data Range
Shard 1 user_id 1–10 million
Shard 2 user_id 10M–20 million
Shard 3 user_id 20M–30 million

🛠 Tools That Support Sharding:

  • MongoDB (built-in)
  • Vitess (MySQL)
  • Citus (PostgreSQL)
  • Cassandra (sharded by design)
  • ElasticSearch (auto-sharding)

✅ Benefits

  • True horizontal scaling
  • Improved availability & fault isolation
  • Handles very large datasets across regions

⚠️ Drawbacks

  • Complex to implement and maintain
  • Cross-shard joins are difficult
  • Requires careful shard key design
  • Complex backup & consistency management

🔁 Partitioning vs Sharding: Comparison Table

Feature Partitioning Sharding
Scope Inside one database Across multiple databases/servers
Managed By Database Engine Application or Shard Middleware
Logical Unit Table partition Database/shard
Cross-Partition Joins Supported Difficult or unsupported
Scalability Limited to DB machine Horizontally scalable
Use Case Structured, large tables Global-scale systems (Facebook, etc.)

📌 Summary

  • Partitioning is suitable for scaling within a single database and improving query performance for large tables.
  • Sharding is ideal for massive-scale, distributed systems that require true horizontal scaling and fault tolerance.

Use the right strategy based on your system's architecture, data volume, and scalability requirements.





🧭 Difference Between Observability and Monitoring

Aspect Monitoring Observability
🔍 Definition Collecting predefined metrics to track system health Understanding internal state of a system by analyzing outputs
🎯 Goal Detect known issues and alert when something breaks Investigate and diagnose unknown or complex issues
🔧 Approach Reactive – predefined checks and dashboards Proactive – enables asking new questions and exploring behavior
🔬 Focus Known problems Unknown unknowns
🧱 Components Metrics, alerts, dashboards Metrics + Logs + Traces (3 Pillars of Observability)
📊 Tools Prometheus, Nagios, Zabbix OpenTelemetry, Grafana, Jaeger, Honeycomb
🚨 Use case Alert when CPU > 90% Understand why latency is increasing randomly
💡 Analogy Thermometer shows temperature (monitoring) Doctor uses symptoms + scan + history to diagnose (observability)

📦 Example

Monitoring:

  • You set a rule: “Alert me if memory usage goes above 90%”.
  • You get notified when it does.

Observability:

  • Your app slows down.
  • You don't know why.
  • You dive into metrics, traces, logs – see a DB call is slow due to network latency.
  • You find a misconfigured load balancer in a specific region.

Key Takeaway:

Monitoring is a subset of Observability.

Observability is about having enough data and tooling to answer any question about your system, even if you didn’t anticipate the issue in advance.



📡 What is OpenTelemetry?

OpenTelemetry is a vendor-neutral, open-source observability framework by the CNCF that provides standardized tools to collect, process, and export telemetry data — specifically metrics, logs, and traces — from applications and infrastructure.

It consists of:

  • SDKs for instrumentation, and
  • A collector component that receives telemetry data, processes it (like batching or sampling), and exports it to observability backends like New Relic, Prometheus, Jaeger, or any OTLP-compatible platform.

💪 Why OpenTelemetry is Powerful

What makes OpenTelemetry powerful is that it decouples telemetry generation from storage or visualization.

You write once using OTel SDKs and can export to any backend without being locked into a vendor.


🧪 Real-World Example

In my previous project at Janitri, I used OpenTelemetry SDKs in the backend to instrument REST APIs and used the OpenTelemetry Collector to forward metrics to Prometheus.

Logs and traces were optionally integrated via extensions.


🔄 In a New Relic Setup

This same SDK can send data directly to New Relic via the OTLP exporter, giving you full-stack visibility — with no vendor-specific lock-in.


🎯 Conclusion

That’s the beauty of OpenTelemetry:

  • It’s interoperable
  • It’s future-proof
  • It aligns deeply with New Relic’s support for open standards


📊 What is Prometheus?

Prometheus is an open-source, time-series database and monitoring system originally developed by SoundCloud and now part of the CNCF (Cloud Native Computing Foundation).

It is designed to collect and store metrics from systems and applications using a pull-based model.


⚙️ How Prometheus Works

  • Prometheus scrapes data from exposed endpoints (typically /metrics).
  • It stores this data in its local time-series database (TSDB).
  • Querying is done using its powerful query language called PromQL.
  • It supports rule-based alerting using its built-in component called Alertmanager.

📌 Key Characteristics

Feature Description
🔄 Pull-Based Model Prometheus pulls metrics data from targets, instead of targets pushing data
📈 Metric-Focused Only handles metrics (no support for logs or traces)
🧠 PromQL A flexible and powerful query language
🚫 No Built-in Clustering Does not support native clustering or long-term storage out of the box
🔗 Extensibility Can be extended using projects like Thanos or Cortex for high availability

👨‍💻 Real-World Example (Janitri Project)

In my project at Janitri, I used Prometheus alongside OpenTelemetry to collect real-time metrics related to API performance.

I visualized this data using Grafana, which gave immediate insights, although the setup required some effort and configuration.


🤝 Why Prometheus with OpenTelemetry?

OpenTelemetry is a telemetry generation and export framework — not a full observability stack.

It collects metrics, logs, and traces from applications using SDKs and exports them to a backend.

Prometheus is one such backend — specialized in metrics.


🔁 Integration Flow

  1. I used OpenTelemetry SDKs to instrument my application.
  2. Then I used the OpenTelemetry Collector to expose metrics in Prometheus format via the /metrics receiver.
  3. Prometheus scraped this data, stored it, and allowed me to:
    • Query it using PromQL
    • Set up alerts via Alertmanager

🔗 Conclusion

Prometheus completed what OpenTelemetry started

  • 🛠️ OTel was the producer
  • 🧠 Prometheus was the consumer, storage, and query engine

This architecture was:

  • Modular
  • 🔄 Flexible
  • 🔮 Future-proof

If needed, I could easily swap Prometheus with any OTLP-compatible backend (e.g., New Relic) without changing instrumentation code.

That’s the power of combining OpenTelemetry with open, pluggable tools like Prometheus.

This architecture was modular and future-proof. If needed, I could swap Prometheus with any other OTLP-compatible backend — like New
Relic — without changing instrumentation code, In New Relic’s case, I can just add an OTLP exporter to forward all telemetry to New Relic’s platform



🧭 Full Observability Stack using OpenTelemetry

This architecture illustrates how telemetry flows from instrumented code all the way to dashboards using tools like OpenTelemetry, Prometheus, Loki, Jaeger, and Grafana.


1️⃣ Instrumentation Layer (Your Code)

Add OpenTelemetry SDKs to generate telemetry (metrics, logs, traces).

You can use:

  • Auto-instrumentation agents

    (e.g. for Node.js, Python, Java)

  • Manual instrumentation

    (tracer.startSpan(), meter.record(), etc.)


2️⃣ Collector Layer

The OpenTelemetry Collector is the heart of the pipeline:

  • Receives data via receivers
  • Processes data (optional) via processors
  • Sends data to exporters (e.g., Prometheus, Jaeger)

You can run the Collector as:

  • 🟢 Agent – runs locally on each host (lightweight)
  • 🟣 Gateway – centralized telemetry router (common in prod)

3️⃣ Backend Layer

These are the specialized storage tools for each data type:

Data Type Tool Purpose
Metrics Prometheus Monitoring, alerting, dashboards
Logs Loki Log aggregation & searchable logs
Traces Jaeger/Tempo Distributed tracing & request flow

These tools store and index the telemetry so that Grafana (or New Relic) can query them.


4️⃣ Visualization Layer (Grafana)

  • Grafana connects to:

    • Prometheus (for metrics)
    • Loki (for logs)
    • Jaeger/Tempo (for traces)
  • Unified dashboards for all observability pillars

  • Create alerts (e.g., CPU > 80%, error rate > 5%)

  • Supports full correlation:

    • Logs → Traces → Metrics from one screen

🧠 Key Interview Lines You Can Drop

  • “The OpenTelemetry Collector acts as a hub where all telemetry — metrics, logs, traces — is routed, transformed, and exported.”

  • “Grafana sits on top as the visual UI, but the data lifeblood flows from instrumented apps through OpenTelemetry.”

  • “In a real production setup, this model gives me flexibility: swap out Prometheus with New Relic just by changing the exporter.”







Git Merge vs Rebase vs Squash - Complete Guide

The Problem

Tumhare paas ek feature branch hai jismein commits A, B, C hain. Main branch mein meanwhile commits D, E add ho gaye hain. Ab kya karna hai?

main:     1---2---D---E
               \
feature:        A---B---C
Enter fullscreen mode Exit fullscreen mode

Option 1: Git Merge 🔗

What happens:

git checkout main
git merge feature-branch
Enter fullscreen mode Exit fullscreen mode

Result:

main: 1---2---D---E---M
           \         /
feature:    A---B---C
Enter fullscreen mode Exit fullscreen mode

Simple Explanation:

  • Ek merge commit (M) create hota hai
  • Dono branches ka history preserve rehta hai
  • "Knot" jaisa structure banta hai

When to use:

  • Jab complete history chahiye
  • Team collaboration mein transparency chahiye
  • Feature branch ka detailed development track karna ho

Option 2: Git Rebase ↗️

What happens:

git checkout feature-branch
git rebase main
Enter fullscreen mode Exit fullscreen mode

Result:

main: 1---2---D---E---A'---B'---C'
Enter fullscreen mode Exit fullscreen mode

Simple Explanation:

  • Feature branch commits ko main ke "tip" pe move kar deta hai
  • Clean, linear history milti hai
  • Original commits A,B,C become A',B',C' (new commit IDs)

When to use:

  • Clean, linear history chahiye
  • Complex merge conflicts avoid karne ke liye
  • Professional projects mein preferred

Option 3: Squash Commits 🗜️

What happens:

git checkout main
git merge --squash feature-branch
git commit -m "Add complete feature X"
Enter fullscreen mode Exit fullscreen mode

Result:

main: 1---2---D---E---S
Enter fullscreen mode Exit fullscreen mode

Simple Explanation:

  • Saare feature commits (A+B+C) ko ek single commit (S) mein combine kar deta hai
  • Main branch mein sirf ek clean commit dikhta hai
  • Individual commits ka detail lose ho jaata hai main mein

When to use:

  • Main branch mein clean history chahiye
  • Feature development details main mein nahi chahiye
  • GitHub/GitLab mein popular approach

Real World Scenarios 🌍

Scenario 1: Small Personal Project

Use: Simple merge

  • History complexity matter nahi karta
  • Quick and easy

Scenario 2: Professional Team Project

Use: Rebase + Fast-forward merge

  • Clean linear history
  • Easy to track changes
  • Professional appearance

Scenario 3: Open Source Project

Use: Squash commits

  • Main branch clean rehti hai
  • Contributors ka detailed work feature branch mein preserved
  • Easy to review and rollback

Commands Summary 📝

Merge:

git checkout main
git merge feature-branch
Enter fullscreen mode Exit fullscreen mode

Rebase:

git checkout feature-branch
git rebase main
git checkout main
git merge feature-branch  # Fast-forward merge
Enter fullscreen mode Exit fullscreen mode

Squash:

git checkout main
git merge --squash feature-branch
git commit -m "Descriptive message"
Enter fullscreen mode Exit fullscreen mode

Top comments (0)