DEV Community: Zaki-goumri

Some indexing data structures

Zaki-goumri — Sun, 14 Dec 2025 14:35:36 +0000

When working with databases, we often think about queries, schemas, and APIs. But underneath all of that lies a more fundamental concern:
How does a database find data efficiently?

Indexes are the data structures that answer this question. They allow databases to locate records quickly without scanning entire files on disk.

Designing indexes for disk-based storage is very different from designing in-memory data structures. Disk access is slow, memory is limited, crashes must be handled safely, and data is constantly being updated. These constraints have led to specialized indexing structures that look very different from the data structures we use in application code.

In this article, we’ll explore several core index data structures used in modern storage engines:

Hash indexes, the simplest form of key-based indexing
Sorted String Tables (SSTables), which store data in sorted order on disk
Log-Structured Merge Trees (LSM-Trees), which organize and merge SSTables efficiently at scale
B-Trees , the classic indexing structure behind many databases

Rather than focusing on formal definitions, the goal is to build an intuitive understanding of how these structures work, why they exist, and what trade offs they introduce. By the end, you should be able to reason about why different databases choose different index designs.

Let’s begin with the simplest building block: the hash index.

Hash Indexes

When we start exploring database indexes, the simplest and most intuitive structure is the hash index. Hash indexes are based on the same concept as in-memory hash maps or dictionaries in programming languages: a key maps directly to a value.

In a database, the goal is the same: quickly find the location of a record on disk using its key.

How a Hash Index Works

Imagine a database that only appends data to a log file on disk. Each record consists of a key-value pair. To find data efficiently, the database maintains an in-memory hash map that stores the mapping:

Key → Byte offset in the log file

Whenever a new record is written:

Append it to the log file.
Update the in-memory hash map to point to the new offset.

When reading a value:

Look up the key in the hash map.
Seek to the offset in the log file.
Read the value.

Segment Files and Compaction

If the database only appended to a single log file, it would grow indefinitely. To solve this:

Logs are split into segments (files of fixed size).
Old segments are immutable.
A background compaction process merges multiple segments, keeping only the latest value for each key.

This ensures:

Disk space is reclaimed.
Reads remain efficient.

- Old data is cleaned up without affecting ongoing writes.

Handling Deletes (Tombstones)

To delete a key:

A special record called a tombstone is appended to the log.
During compaction, the tombstone ensures that all previous values for that key are discarded.

This keeps deletes consistent without needing in-place updates.

Crash Safety and Recovery

Hash indexes are append-only, which simplifies crash recovery:

Partially written records are detected using checksums.
In-memory hash maps are rebuilt on restart either by scanning segments or loading a snapshot saved on disk.

This ensures the database can recover quickly even after a crash.

Concurrency

Hash indexes are easy to make thread-safe:

Usually, one writer thread appends to the log.
Multiple readers can read segments concurrently.
Immutable segments eliminate the need for complex locking.

Advantages of Hash Indexes
Fast exact key lookups (O(1) in memory).
Simple design and high write throughput.
Easy crash recovery due to append-only files.

Limitations

Memory-bound: all keys must fit in RAM.
No ordering: cannot efficiently perform range queries.
Not ideal for very large datasets if the key space is huge.

These limitations motivate Sorted String Tables (SSTables), which preserve key order on disk and form the foundation for LSM-Trees, which we’ll discuss next.

SSTables (Sorted String Tables)

While hash indexes are fast for exact key lookups, they have some limitations:

They must fit entirely in memory.
They do not preserve key order, so range queries are inefficient.
On-disk hash maps are hard to maintain efficiently.

SSTables solve these problems by storing key-value pairs in sorted order on disk.

What is an SSTable?

An SSTable (Sorted String Table) is a file that contains a sequence of key-value pairs:

Sorted by key.
Each key appears only once per SSTable.
Immutable after being written to disk.

This design allows efficient merges, range queries, and reduces the need for a large in-memory index.

Constructing an SSTable

Incoming writes are unordered, so we maintain a memtable in memory:

Memtable = in-memory balanced tree (e.g., red-black tree) storing key-value pairs.
When the memtable exceeds a threshold (a few MB), write it to disk as an SSTable.
While writing the SSTable, a new memtable handles incoming writes.

Reading from SSTables

To find a key:

Check the memtable first.
Check the most recent SSTable, then the next-most-recent, etc.
Because SSTables are sorted, we can use sparse in-memory indexes:

Only store offsets for some keys.
Scan a few KBs in the file to find the exact key.

Merging and Compaction

SSTables are immutable, so updates and deletes generate new SSTables:

Old SSTables are merged and compacted in the background.
During merge:
- Keep only the most recent value for each key.
- Discard obsolete or deleted entries.
Result = fewer files, sequential writes, and efficient storage.

Advantages of SSTables

Preserve sorted order, enabling range queries.
Immutable, which simplifies concurrency and crash recovery.
Can store datasets larger than memory efficiently.
Background compaction keeps disk usage optimal.

Limitations

Reads may need to check multiple SSTables to find a key.
Writes generate temporary files and require compaction.

These challenges are addressed by LSM-Trees, which organize multiple SSTables into a hierarchy for high write throughput and efficient reads.

Next, we can write the LSM-Tree section in the same style, showing how it uses SSTables, handles compaction, and integrates Bloom filters for fast non-existent key lookups.

LSM-Trees (Log-Structured Merge-Trees)

While SSTables solve the problems of hash indexes by keeping keys sorted and enabling range queries, reading a key may still require checking multiple SSTables.

Log-Structured Merge-Trees (LSM-Trees) organize SSTables into a hierarchical structure to optimize both writes and reads.

What is an LSM-Tree?

An LSM-Tree is essentially a cascade of SSTables:

Memtable (in-memory balanced tree) receives incoming writes.
When the memtable fills up, it is flushed to disk as an SSTable.
Older SSTables are gradually merged and compacted into larger SSTables at lower levels.

This creates multiple levels of SSTables, where:

Level 0 = most recent SSTables
Higher levels = older, merged SSTables

How Writes Work

Write comes in → added to the memtable.
Write is appended to a log on disk (for crash recovery).
When memtable exceeds threshold → flush to a new SSTable at Level 0.
Background compaction merges SSTables into higher levels.

Key Points:

Writes are mostly sequential, maximizing disk throughput.
Old SSTables are never modified; only merged into new SSTables.

How Reads Work

Search memtable first (fast, in-memory).
Check Level 0 SSTables, then Level 1, and so on.
To reduce disk reads, LSM-Trees use Bloom filters:

Memory-efficient structure that checks if a key does not exist.
Avoids reading SSTables unnecessarily.

Compaction Strategies

Compaction ensures that:

Duplicate keys are merged, keeping the most recent value.
Deleted keys (tombstones) are purged.
Disk usage stays manageable.

Popular strategies:

Size-tiered compaction: Merge smaller SSTables into larger ones.
Leveled compaction: SSTables are organized in levels; each level contains non-overlapping key ranges.

Advantages of LSM-Trees

Extremely high write throughput.
Can handle datasets larger than memory.
Efficient range queries due to sorted SSTables.
Crash recovery is simple (append-only log + immutable SSTables).

Limitations

Reads may need to check multiple levels → slightly slower than in-memory hash indexes.
Compaction consumes CPU and I/O resources, but it can be scheduled in the background.

B-Trees

While LSM-Trees and SSTables are optimized for write-heavy workloads, B-Trees are the most common indexing structure used in relational databases and many key-value stores. They are optimized for balanced reads and writes with efficient range queries.

What is a B-Tree?

A B-Tree is a self-balancing tree where:

Nodes (pages) store keys and pointers to child nodes.
All leaf nodes are at the same depth.
Nodes are designed to match the size of disk pages (e.g., 4 KB) for efficient I/O.

Key Properties:

Keys in each node are sorted.
Each node has a branching factor (number of children).
Tree height is kept low, so reads involve few disk accesses.

How Reads Work

Start at the root node.
Compare the key with the keys in the node.
Follow the pointer to the child node covering the key range.
Repeat until reaching a leaf node, which contains the key or a pointer to its value.

Lookup complexity = O(log n), typically just a few disk page reads.

How Writes Work

Locate the leaf node for the key.
Insert the key and value.
If the node exceeds its capacity:

Split the node into two.
Update the parent node with the new key and pointer.
Repeat recursively if necessary.

Pages are updated in place, unlike LSM-Trees which append to files.

Crash Safety and Concurrency

Writes in B-Trees may require multiple pages to be updated.
To avoid corruption, databases often use a Write-Ahead Log (WAL):
- Append changes to the log first.
- Apply changes to the tree.
- On crash, replay the WAL to restore consistency.
Concurrency is handled with latches (lightweight locks) to prevent inconsistent reads during updates.

Advantages of B-Trees

Efficient point lookups and range queries.
Low tree height → few disk reads.
Well-understood, widely implemented in relational databases.
Can handle in-place updates, avoiding temporary file creation.

Limitations

Random writes can be slower than sequential writes (e.g., LSM-Trees).
Complex concurrency control is needed for multiple writers.
Maintaining sorted order in-place can lead to fragmentation over time.

Comparison of Index Data Structures

Feature / Structure	Hash Index	SSTable	LSM-Tree	B-Tree
Key order	Unordered	Sorted	Sorted per SSTable	Sorted
Best use case	Exact key lookup, small number of keys in memory	Read-heavy workloads with mostly immutable data	Write-heavy workloads, large datasets	Balanced read/write, range queries
Read efficiency	Very fast (single hash lookup)	Moderate (sparse index + scan)	Moderate to high (memtable + multiple SSTables, optimized with Bloom filters)	High (O(log n), few page reads)
Write efficiency	Very high (append + update hash map)	Moderate (requires creating new SSTable)	Very high (sequential writes to memtable/SSTables)	Moderate (in-place updates, node splits)
Memory usage	High (hash map in memory)	Low (sparse in-memory index)	Low (memtable + Bloom filters)	Moderate (depends on tree nodes in memory)
Crash recovery	Use snapshot of hash maps + append-only log	Memtable + log for recent writes	Memtable + log; SSTables immutable	Write-ahead log (WAL) or copy-on-write
Range queries	Poor	Good	Good	Excellent
Concurrency	Easy (single writer)	Easy (append-only SSTables)	Easy (append-only + compaction in background)	Complex (requires locks/latches)
Disk layout	Segments of key-value pairs	Sorted immutable files	Hierarchy of SSTables with levels	Tree of fixed-size pages

Conclusion

Indexing is the backbone of efficient data retrieval in databases and storage engines. Choosing the right index depends on your workload:

Hash indexes are perfect for fast, exact key lookups with a small set of keys in memory.
SSTables provide sorted storage with efficient range queries, laying the groundwork for LSM-Trees.
LSM-Trees excel in write-heavy workloads and can scale to datasets much larger than memory while maintaining efficient reads.
B-Trees remain the go-to for relational databases, offering balanced read/write performance, excellent range queries, and strong support for concurrency.

Understanding the trade-offs of each structure helps developers and architects design systems tailored to their data access patterns.

GraphQL Deep Dive: How It Really Works Beyond the Basics

Zaki-goumri — Sat, 23 Aug 2025 00:56:47 +0000

1. Introduction

GraphQL is often marketed as the “REST killer,” but that’s not the full story. It’s a query language for APIs that gives clients the power to request exactly the data they need. Behind the hype, GraphQL is still just requests and responses over a transport protocol like HTTP. To use it effectively, you need to understand not just what GraphQL is, but how it actually works under the hood.

1. Introduction

In this article, we’ll break down GraphQL transports (HTTP, WebSockets, even TCP/UDP), why queries are sent via POST, how browsers handle requests, caching challenges, rate limiting, security, and the cost of parsing. We’ll also use GitHub’s GraphQL API as a concrete example.

2. How GraphQL Moves Data

By default, GraphQL runs over HTTP POST. A client sends a JSON object containing three fields:

 {
  "query": "...",
  "variables": { ... },
  "operationName": "..."
}

Why POST?

GET requests can’t have a body. That means the query has to be encoded into the URL as a query parameter.
With nested queries, this quickly hits server/browser limits 2048 char in URL (e.g. a 414 URI Too Long error).
POST avoids this problem by letting you send the query, variables, and operation name inside the request body as JSON so you got an unlimited HTTP request .

Best practice: always use POST with application/json. Some libraries support GET for persisted queries (where the client sends just a hash and the server looks up the query), but for general use, POST is the reliable, future proof choice.

3. Fetching the Schema

One of GraphQL’s superpowers is that it’s self describing. A client can send an introspection query to fetch the schema and understand what fields are available. Most servers expose this at /graphql.

4. Example: Making a Request in the Browser

Here’s how you’d query GitHub’s GraphQL API using fetch:

fetch("https://api.github.com/graphql", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${TOKEN}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    query: `
      query($login: String!) {
        user(login: $login) {
          name
          repositories(first: 3) {
            nodes { name }
          }
        }
      }
    `,
    variables: { login: "octocat" }
  })
})
.then(res => res.json())
.then(console.log)

The response looks like this:

{
  "data": {
    "user": {
      "name": "The Octocat",
      "repositories": {
        "nodes": [
          { "name": "Hello-World" },
          { "name": "Spoon-Knife" }
        ]
      }
    }
  }
}

5. GraphQL Over TCP, UDP, and WebSockets

The GraphQL spec is transport-agnostic. HTTP is just the most common choice.

TCP: You can send GraphQL queries over raw TCP sockets. Rare in practice, but possible for microservice communication.
UDP: Not realistic: GraphQL requires ordered, reliable delivery.
WebSockets: Widely used for subscriptions (real-time events). Example: a subscription to get new messages in a chat app.


subscription {
  newMessage {
    id
    content
    sender { name }
  }
}

Libraries like graphql-ws or Apollo Subscriptions handle this in production.

6. Why Caching is Hard

With REST, GET /users/1 maps to a specific resource and can be cached easily.

With GraphQL, everything goes through a single endpoint like /graphql. To cache, you’d need to parse the query and understand which fields it requests.
This means:

No ETag or straightforward HTTP caching out of the box.
Clients (Apollo, Relay) handle caching themselves.
A workaround is persisted queries — send only a query hash to the server, which can then use normal caching.

7. Headers, Rate Limiting & REST-like Concerns

GraphQL still uses HTTP headers just like REST.

Authorization for tokens
Content-Type: application/json
Custom headers as needed

Rate limiting is more complex than REST:

REST: 1 request = 1 cost.
GraphQL: 1 request could be “light” or “heavy” depending on how much data it fetches.

GitHub solves this by assigning a point cost to queries. You get a quota of points per hour. Querying 100 repositories costs more than querying 1.

This means you need to design queries carefully:

Don’t request more fields than necessary.
Use pagination (first, after) to fetch large data sets gradually.
Cache results client-side when possible.

8. Complexity & Cost of Parsing

One of the trade-offs with GraphQL is that every request has to be parsed, validated, and executed — unlike REST, which usually just matches a URL to a handler.

Here’s what happens step by step:

Parse – The raw query string is converted into an AST (Abstract Syntax Tree).
- An AST is a structured tree representation of your query.
- Each field, argument, and nested selection becomes a node in that tree.
- Example:

  {
  user(login: "octocat") {
    name
    repositories(first: 2) {
      nodes { name }
    }
  }
}

AST (simplified):

Operation: Query
 └── Field: user (args: login="octocat")
      ├── Field: name
      └── Field: repositories (args: first=2)
           └── Field: nodes
                └── Field: name

This structured tree makes it easier for the GraphQL engine to understand what’s being asked.
Validate – The server checks the AST against the schema:
- Does user exist as a type?
- Does repositories belong to user?
- Are the arguments valid?
Execute – The server walks the AST node by node, calling the corresponding resolvers to fetch the data.

Why This Is Costly
Parsing and validation take CPU time compared to REST’s simple “route → handler” model.
Deep or malicious queries can blow up execution time (friends { friends { friends ... }}).
Servers must defend themselves against expensive queries.

Mitigations

Query depth limiting – block queries that go too deep.
Query complexity analysis – assign a “cost score” to queries.
Persisted queries – skip parsing/validation for known queries by storing their AST on the server.

9. Security Concerns

GraphQL opens the door to:

DoS via expensive queries.
Data exposure if introspection is left on in production.
Injection attacks if resolvers are insecure.

Best practices:

Disable introspection in prod (unless you need it).
Add query depth/cost limits.
Sanitize inputs inside resolvers.

10. Error Handling in GraphQL

Error handling in GraphQL works differently than in REST. With REST, if something goes wrong you usually get an HTTP error code (400, 404, 500, etc.) and a message. In GraphQL, responses are always wrapped in JSON, and errors are returned in a dedicated errors field.

Example
If you send a query:

{
  user(login: "does-not-exist") {
    name
  }
}

The response might look like:

{
  "data": {
    "user": null
  },
  "errors": [
    {
      "message": "Could not find user with login 'does-not-exist'",
      "locations": [{ "line": 2, "column": 3 }],
      "path": ["user"]
    }
  ]
}

Notice a few things:

The data field is still present (with null for the invalid part).

The errors array contains messages, where they happened in the query (locations), and which path failed (path).

The HTTP status is still 200 OK because technically the request was valid, even though part of it failed.

Types of Errors

Validation errors – Query doesn’t match the schema (wrong field, wrong argument).

Execution errors – Resolver fails (e.g., database issue, missing resource).

Partial failures – Some fields resolve successfully, others fail. This is common and often useful — the client can still render partial data.

Best Practices

Don’t rely solely on HTTP codes. Expect errors in the response body.

Add error extensions – You can attach custom fields to errors (e.g., error codes, internal tracking IDs). Example:

{
  "errors": [
    {
      "message": "Not authorized",
      "extensions": { "code": "UNAUTHENTICATED" }
    }
  ]
}

Monitor for abuse Since HTTP 200 is returned even on logical errors, API monitoring should inspect both data and errors.

Combine with headers For things like auth failures or rate limits, it’s common to still use HTTP headers (401 Unauthorized, 429 Too Many Requests) alongside the GraphQL error payload.

11. When to Use and When Not To

Use GraphQL when:

You have complex UIs that need nested data.
Mobile clients need efficient, custom responses.
You’re aggregating multiple backends into one schema.

Avoid GraphQL when:

You only need simple CRUD.
You rely heavily on CDN caching.
Your team isn’t ready for the added complexity.

12. Wrap-Up

GraphQL is powerful, but not a silver bullet. It trades simplicity and caching for flexibility and efficiency.

Over HTTP POST by default, but can run over WebSockets or TCP.
Harder to cache, harder to rate limit, more costly to parse.
Still uses headers, auth, and other REST-like patterns.
Great for complex data fetching, risky for simple APIs.

Think of GraphQL not as a replacement for REST, but as an option in your toolbox to be used when its strengths outweigh its complexity.

Designing Data-Intensive Applications: A Summary of Reliability, Scalability, and Maintainability

Zaki-goumri — Fri, 21 Feb 2025 11:01:07 +0000

Welcome to the first article in my series summarizing one of the greatest books on software engineering: Designing Data-Intensive Applications by Martin Kleppmann. This series aims to distill key concepts from the book, providing a quick refresher for those familiar with the material and a fast-paced introduction for those who prefer concise insights over reading entire books.

In this article, we’ll explore three fundamental concerns in software engineering: reliability, scalability, and maintainability. These are often referred to as non-functional requirements or quality attributes. They describe how a system should behave rather than what it should do, guiding its design, development, and operation.

Modern data systems often combine multiple tools to handle massive workloads. For example:

Message queues like RabbitMQ, Kafka, and IBM MQ ensure reliable communication between services.
In-memory databases like Redis provide low-latency access to frequently accessed data.
APIs abstract away implementation details, presenting a clean interface to clients.

When designing a data system or service, several questions arise:

How do you ensure data remains correct and complete, even in the face of failures?
How do you handle edge cases, such as network partitions or hardware failures?
How do you provide good performance to clients while optimizing costs?

These questions highlight the importance of reliability, scalability, and maintainability in system design. Over the next few articles, we’ll explore each of these concepts in detail, starting with reliability.

Reliability

Reliability means the system continues to function correctly, even in the face of faults. A reliable system ensures that data is accurate, complete, and available when needed. Reliability depends on many factors:

Design Quality: Poor design or lack of proper planning can lead to frequent failures. For example, a system without fault tolerance may crash under unexpected conditions.
Hardware Quality: Low-quality components or wear and tear can cause breakdowns. Redundancy (e.g., using multiple servers) can mitigate this risk.
Software Bugs: Errors in the software code can lead to crashes or malfunctions. Rigorous testing and code reviews are essential to minimize bugs.
Maintenance: Lack of regular updates, fixes, or testing can reduce reliability. Proactive maintenance ensures the system stays robust over time.
Workload: Overloading a system beyond its capacity can cause failures. Proper capacity planning and load testing are critical.
External Conditions: Environmental factors like temperature, power surges, or network issues can affect performance. Designing for resilience (e.g., backup power supplies) helps mitigate these risks.
Redundancy: A lack of backup systems or fail-safes can make a system less reliable. Redundancy ensures that failures in one component don’t bring down the entire system.

How to Improve Reliability

Routine Maintenance: Keep systems up-to-date and modernized through regular updates and patches.
Redundancy: Implement backup systems to prevent component failures from halting processes.
Quality Control: Test system changes thoroughly before deploying them to production.
Monitoring and Analysis: Use comprehensive data collection and analysis to understand system reliability and performance.
Incident Communication: Improve communication during incidents to reduce response and recovery time.

Scalability

Scalability is the capacity of a system to support growth or manage an increasing volume of work. A scalable system can handle more users, data, and traffic without sacrificing speed or reliability.

Why Scalability Matters

Managing Growth: Scalable systems can grow with your business, accommodating more users and data without performance degradation.
Improving Performance: By distributing the load across multiple servers or resources, scalable systems achieve faster processing speeds and better response times.
Ensuring Availability: Scalability ensures systems remain operational even during traffic spikes or component failures.
Cost-Effectiveness: Scalable systems adjust resources dynamically, avoiding over-provisioning and reducing costs.
Encouraging Innovation: Scalability lowers infrastructure barriers, enabling the development of new features and services.

Measuring Scalability

Scalability is not a one-dimensional metric—it depends on the specific load parameters relevant to the system. These parameters vary depending on the system’s purpose. For example:

Web Server: Requests per second.
Database: Ratio of reads to writes, number of simultaneous active users, or cache hit rate.
Chat System: Number of messages sent per second or number of concurrent connections.

To describe scalability, you need to define the load parameters and their distribution. For example:

A system might handle 10,000 requests per second, but if 90% of those requests are for the same piece of data (e.g., a popular video or post), the load distribution is skewed, and the system must be designed to handle such cases.

How to Scale

Before scaling, you need to understand your system’s load parameters and performance goals. Then, choose a scaling strategy:

Vertical Scaling (Scaling Up): Add more resources (e.g., CPU, memory, disk) to a single machine. This is simple but limited by the machine’s maximum capacity.
Horizontal Scaling (Scaling Out): Add more machines to distribute the load. This is more complex but offers greater scalability and fault tolerance.

Other techniques include:

Partitioning (Sharding): Split data across multiple machines to distribute the load.
Replication: Store copies of data on multiple machines for fault tolerance and read scalability.
Caching: Store frequently accessed data in fast storage (e.g., Redis) to reduce latency.
Asynchronous Processing: Use message queues (e.g., Kafka) to decouple tasks and handle them in the background.
Autoscaling: Automatically add or remove resources based on load.

Maintainability

Maintainability is the ability of a system to undergo repairs and modifications while remaining operational. A maintainable system is easy to understand, modify, and operate over time.

Principles of Maintainability

Operability: Make it easy for operations teams to keep the system running smoothly. This includes:

Providing good monitoring and alerting tools.
Documenting runbooks for common operational tasks.
Designing for automation (e.g., automated backups and scaling).

Simplicity: Keep the system as simple as possible, avoiding unnecessary complexity. This involves:

Using abstractions to hide implementation details.
Following the KISS principle (Keep It Simple, Stupid).
Refactoring regularly to remove technical debt.

Evolvability: Make it easy to adapt the system to changing requirements. This requires:

Designing for modularity (e.g., microservices, well-defined interfaces).
Using backward-compatible changes (e.g., versioned APIs).
- Writing tests to ensure changes don’t break existing functionality.

How to Improve Maintainability

Good Documentation: Document the system’s architecture, APIs, and operational procedures.
Modular Design: Break the system into small, independent components with clear interfaces.
Automated Testing: Write unit, integration, and end-to-end tests to catch bugs early.
Monitoring and Observability: Use metrics, logs, and distributed tracing to detect and diagnose issues quickly.
Version Control and CI/CD: Use version control (e.g., Git) and CI/CD pipelines to manage changes efficiently.

Conclusion

In this article, we explored the three pillars of system design: reliability, scalability, and maintainability. These non-functional requirements are critical for building systems that are robust, efficient, and adaptable over time. By understanding and applying these principles, you can design systems that meet the demands of modern applications.

In the next article, we’ll dive deeper into data models and query languages, exploring how different models (e.g., relational, document, graph) shape the design of data-intensive systems. Thank you for reading, and if you have any feedback or suggestions, please let me know in the comments!!

Blockchain

Zaki-goumri — Wed, 05 Feb 2025 08:10:58 +0000

Blockchain in express

Zaki-goumri — Wed, 05 Feb 2025 08:10:29 +0000

Passport js

Zaki-goumri — Thu, 12 Dec 2024 09:24:20 +0000

DEV Community: Zaki-goumri

Some indexing data structures

Hash Indexes

How a Hash Index Works

Segment Files and Compaction

- Old data is cleaned up without affecting ongoing writes.

Handling Deletes (Tombstones)

Crash Safety and Recovery

Concurrency

Immutable segments eliminate the need for complex locking.

Advantages of Hash Indexes

Limitations

SSTables (Sorted String Tables)

What is an SSTable?

Constructing an SSTable

Reading from SSTables

Merging and Compaction

Advantages of SSTables

Limitations

LSM-Trees (Log-Structured Merge-Trees)

What is an LSM-Tree?

How Writes Work

How Reads Work

Compaction Strategies

Advantages of LSM-Trees

Limitations

B-Trees

What is a B-Tree?

How Reads Work

How Writes Work

Crash Safety and Concurrency

Advantages of B-Trees

Limitations

Comparison of Index Data Structures

Conclusion

GraphQL Deep Dive: How It Really Works Beyond the Basics

1. Introduction

1. Introduction

2. How GraphQL Moves Data

3. Fetching the Schema

4. Example: Making a Request in the Browser

5. GraphQL Over TCP, UDP, and WebSockets

6. Why Caching is Hard

7. Headers, Rate Limiting & REST-like Concerns

8. Complexity & Cost of Parsing

Why This Is Costly

Mitigations

9. Security Concerns

10. Error Handling in GraphQL

11. When to Use and When Not To

12. Wrap-Up

Designing Data-Intensive Applications: A Summary of Reliability, Scalability, and Maintainability

Reliability

How to Improve Reliability

Scalability

Why Scalability Matters

Measuring Scalability

How to Scale

Maintainability

Principles of Maintainability

How to Improve Maintainability

Conclusion

Blockchain

Blockchain in express

Passport js