MUHAMMAD USMAN AWAN

Posted on Oct 29

Indexing — Architecture Series: Part 2

#architecture #node #webdev #javascript

🚀 Complete Guide to Indexing (Database + Frontend)

🧠 What Is Indexing?

Indexing is the process of creating a data structure (often a B-tree, hash table, or specialized structure) that makes data retrieval dramatically faster — similar to the index in a book.
Instead of scanning the entire dataset, the system jumps directly to the relevant location.

Without an index: the database performs a full table scan (O(n))
With an index: the database performs a logarithmic or constant-time lookup (O(log n) or O(1))

🔍 Analogy:

Looking for “Database Indexing” in a 500-page book:

Without index → Read all 500 pages.

With index → Jump straight to page 127.

⚡ Why Use Indexing?

Benefit	Description
Speed	Reduces query time from seconds/minutes to milliseconds.
Scalability	Keeps performance stable as data grows.
Efficiency	Less CPU, disk I/O, and memory usage.
Data Integrity	Unique indexes enforce constraints.
Cost Savings	Lower compute and storage overhead on cloud databases.

🧱 But Why Not Index Everything?

Drawback	Description
Write Overhead	Slows `INSERT`, `UPDATE`, `DELETE` (each index must update).
Space Consumption	Indexes can take up to 2–3x table size.
Maintenance	Need periodic rebuilds to avoid fragmentation.
Optimizer Confusion	Too many indexes can lead to poor query plans.

Use indexes strategically — for columns frequently used in:

WHERE, JOIN, ORDER BY, or GROUP BY
Search and filter queries
Uniqueness checks

🧩 How Indexing Works (B-Tree Example)

            [50]
           /    \
        [25]    [75]
       /  \     /  \
     [10][40] [60][90]

Finding 60:

Compare with 50 → go right.
Compare with 75 → go left.
Found in [60]. → Only 3 comparisons instead of scanning thousands.

🗂️ Types of Indexes (SQL + NoSQL)

1. Clustered Index

Defines physical row order in the table (e.g., primary key).
One per table.

2. Non-Clustered Index

Separate structure storing key → row reference mapping.
Can have multiple per table.

3. Composite (Multi-Column) Index

Example: (user_id, status, created_at)
Order matters:
- ✅ Queries on user_id or (user_id, status)
- ❌ Queries on status alone

4. Unique Index

Prevents duplicate values, like unique emails or usernames.

5. Partial (Filtered) Index

Indexes subset of data (e.g., only WHERE status='active').

6. Full-Text Index

Enables searching inside text fields (e.g., blog posts).

7. Spatial / Geospatial Index

Optimized for geo queries ($near, ST_DWithin, etc.).

🧱 Backend Implementation Examples

🧩 SQL Example (PostgreSQL/MySQL)

-- Create table with a primary key (clustered index)
CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  name VARCHAR(100),
  email VARCHAR(100) UNIQUE,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Additional indexes
CREATE INDEX idx_users_email ON users (email);
CREATE INDEX idx_users_name_created ON users (name, created_at);

-- Partial index (only active users)
CREATE INDEX idx_users_email_active
ON users(email)
WHERE status = 'active';

-- Full-text index (PostgreSQL)
CREATE INDEX idx_articles_text
ON articles USING gin(to_tsvector('english', title || ' ' || content));

✅ Use EXPLAIN ANALYZE to confirm:

Index Scan using idx_users_email on users ...

❌ If you see “Seq Scan” → index not being used.

🧩 MongoDB Example

// Create single-field index
db.users.createIndex({ email: 1 });

// Create compound index
db.orders.createIndex({ userId: 1, status: 1, createdAt: -1 });

// Unique index
db.users.createIndex({ email: 1 }, { unique: true });

// Partial index
db.users.createIndex(
  { email: 1 },
  { partialFilterExpression: { status: 'active' } }
);

// Full-text index
db.articles.createIndex({ title: "text", content: "text" });

// Geospatial index
db.restaurants.createIndex({ location: "2dsphere" });

Check performance:

db.users.find({ email: "john@example.com" }).explain("executionStats");

🧠 Node.js Example: Index Management

const { MongoClient } = require("mongodb");
const client = new MongoClient("mongodb://localhost:27017");
await client.connect();
const db = client.db("app");

await db.collection("users").createIndex({ email: 1 }, { unique: true });
await db.collection("orders").createIndex({ userId: 1, createdAt: -1 });

// Analyze query efficiency
const stats = await db.collection("users")
  .find({ email: "test@example.com" })
  .explain("executionStats");

console.log(stats.executionStats);

🧑‍💻 Frontend Indexing

Frontend “indexing” isn’t the same as in databases — but it helps with fast local search, filtering, or lookups on client-side data.

⚙️ Common Approaches

Technique	Use Case
JavaScript Map/Set	O(1) lookups for small datasets
Lunr.js / Fuse.js	Full-text search in SPAs
IndexedDB	Persistent client-side storage with indexes
Web Workers	Offload large dataset processing

🧭 Example: React Client-Side Indexing

import React, { useState, useEffect } from "react";
import lunr from "lunr";
import axios from "axios";

function UserSearch() {
  const [users, setUsers] = useState([]);
  const [emailMap, setEmailMap] = useState(new Map());
  const [searchIndex, setSearchIndex] = useState(null);
  const [query, setQuery] = useState("");
  const [results, setResults] = useState([]);

  useEffect(() => {
    const fetchUsers = async () => {
      const res = await axios.get("/api/users");
      setUsers(res.data);

      // Map for O(1) lookup
      const map = new Map(res.data.map(u => [u.email, u]));
      setEmailMap(map);

      // Lunr full-text index
      const idx = lunr(function () {
        this.ref("id");
        this.field("name");
        this.field("email");
        res.data.forEach(u => this.add(u));
      });
      setSearchIndex(idx);
    };
    fetchUsers();
  }, []);

  const handleSearch = e => {
    const q = e.target.value;
    setQuery(q);
    if (!searchIndex || !q) return setResults([]);
    const matches = searchIndex.search(q);
    setResults(matches.map(r => users.find(u => u.id === parseInt(r.ref))));
  };

  return (
    <div>
      <input value={query} onChange={handleSearch} placeholder="Search user" />
      <ul>{results.map(u => <li key={u.id}>{u.name} - {u.email}</li>)}</ul>
      <button onClick={() => console.log(emailMap.get("john@example.com"))}>
        Quick Lookup
      </button>
    </div>
  );
}

🧮 Best Practices & Edge Cases

#	Principle	Explanation
1	Avoid indexing small tables	Not worth the overhead (<1000 rows).
2	Monitor writes	Bulk inserts? Disable indexes, re-enable after.
3	Composite order matters	`(a,b)` index ≠ `(b,a)` query.
4	Cardinality awareness	Skip low-cardinality columns (e.g., gender).
5	Use EXPLAIN often	Verify index usage in real queries.
6	Avoid over-indexing	Too many indexes can harm performance.
7	Rebuild regularly	Avoid fragmentation (Postgres: `REINDEX`, MySQL: `OPTIMIZE`).
8	Use covering indexes	Include all queried columns → no table read.
9	Security unaffected	Indexes don’t override access control.
10	Frontend caution	Large datasets → paginate or use backend indexing.

🧰 Tools to Analyze Indexes

PostgreSQL: EXPLAIN ANALYZE, pg_stat_user_indexes
MySQL: EXPLAIN, SHOW INDEXES
MongoDB: .explain("executionStats")
Redis / Elasticsearch: Built-in query analyzers
Frontend: Chrome DevTools → Memory tab (check Map usage)

🧭 In Summary

Context	Tool / API	Purpose
SQL	`CREATE INDEX`, `EXPLAIN ANALYZE`	Query speed & data integrity
MongoDB	`.createIndex()`, `.explain()`	Fast queries & filters
Search	Lunr.js, Fuse.js	Local text indexing
Frontend	`Map`, `Set`, IndexedDB	In-memory lookups
Geo	`2dsphere`, PostGIS	Location-based queries

✅ TL;DR — Indexing is your performance multiplier.

Use indexes for read-heavy apps (dashboards, analytics, e-commerce).
Monitor and tune with EXPLAIN tools.
Combine backend precision with frontend responsiveness for the best UX.

If you missed Part 1 — Pagination, check it out first. It explains how to efficiently handle large datasets by loading data in chunks instead of fetching everything at once. Pagination reduces server load, improves performance, and gives users a smoother browsing experience.

DEV Community