DEV Community: Umang Sinha

how spaced repetition actually works: the sm-2 algorithm

Umang Sinha — Sun, 08 Mar 2026 07:50:31 +0000

most people study inefficiently.

we review things too early - wasting time. or too late - after we've already forgotten them.

the real question is simple: when is the optimal moment to review something?

in 1987, a polish researcher named piotr woźniak proposed a surprisingly simple answer. he built an algorithm that schedules reviews so that you see information just before you forget it. that algorithm is called sm-2, and variations of it power spaced-repetition tools used by millions of people today, including systems like anki.

the fascinating part is that the core idea is extremely small. the entire algorithm fits in a few variables and a short formula.

but to understand why it works, we first need to look at how memory behaves.

human memory decays quickly. in the late 19th century, psychologist hermann ebbinghaus studied how quickly people forget newly learned information. his experiments produced what we now call the "forgetting curve" - a model showing that memory retention drops rapidly over time unless the information is reinforced.

" width="800" height="498">

if you learn something today and never revisit it, the probability of remembering it decreases dramatically over the next few days.
this means that reviewing information randomly is inefficient. review too early and you're spending time on something you still remember well. review too late and you've already forgotten it.

the optimal strategy is to review something right before you forget it.

this idea is the foundation of spaced repetition.

instead of reviewing information repeatedly in a short period of time, spaced repetition increases the gap between reviews. a typical review schedule might look something like this:

day 0 → day 1 → day 3 → day 7 → day 16 → day 35 → day 90

each successful recall strengthens the memory, allowing the next review to be pushed further into the future.

however, there is a problem with using a fixed schedule. not all information has the same difficulty.

some facts stick instantly. others require repeated effort to remember. a system that treats all information the same will inevitably be inefficient.

this is where sm-2 becomes interesting.

instead of scheduling reviews globally, sm-2 assigns a small learning model to every flashcard. each card tracks a few pieces of information about your interaction with it, and the scheduling decisions are made based on that history.

the algorithm keeps track of three things.

the repetition count, which is simply the number of successful reviews the card has had.
the interval which represents how many days should pass before the card appears again.
the easiness factor which represents how difficult the card is for you personally.

when a card is first introduced, its easiness factor usually starts at 2.5. this value will change over time depending on how well you recall the card.

whenever you review a card, you grade your recall using a score from 0 to 5.

a score of 5 means the answer was recalled perfectly.
a score of 4 means correct but with hesitation.
a score of 3 means correct but difficult to recall.
a score of 2 or lower indicates failure.

if the score is below 3, the algorithm assumes the card has effectively been forgotten. the repetition counter resets and the interval goes back to one day so that the card can be relearned quickly.

if the recall was successful, the interval grows.

the first successful review schedules the next review for 1 day later.

the second successful review schedules it for 6 days later.

after that, the interval is calculated using a simple rule:

i(n) = i(n−1) × ef

in other words, the next interval is the previous interval multiplied by the card's easiness factor.

if the ef is 2.5, the intervals might look like this:

1 day → 6 days → 15 days → 37 days → 92 days

the intervals expand rapidly, reflecting the fact that well-learned information can be retained for much longer periods.

but the real cleverness of sm-2 lies in how the easiness factor itself is updated.

after each review, the ef is adjusted based on the recall quality using this formula:

ef = ef + (0.1 - (5 - q) × (0.08 + (5 - q) × 0.02))

here, q represents the quality score given during the review.

this formula has an intuitive effect. if a card is consistently recalled easily, the easiness factor increases slightly, causing intervals to grow faster.
if a card is difficult to recall, the ef decreases, making the algorithm schedule reviews more frequently.

to prevent intervals from collapsing completely, the algorithm enforces a minimum value:

ef ≥ 1.3

with just these rules, the algorithm adapts the schedule for every card individually.

easy cards gradually disappear into long review intervals. difficult cards keep resurfacing until they stabilize.

the elegance of sm-2 is that it achieves this adaptive behavior with almost no complexity. the entire system requires only a few variables and a small update formula. there is no machine learning, no probabilistic modeling, and no large datasets involved.

despite that simplicity, it works remarkably well.

even decades after its creation, many spaced-repetition systems still rely on sm-2 or slight variations of it. newer algorithms attempt to model memory more precisely using statistical techniques, but sm-2 remains popular because it strikes a great balance between effectiveness and simplicity.

in fact, you can implement the entire scheduling logic in just a few lines of code. here is what that looks like in go:

type Card struct {  
    Repetition int  
    Interval   int  
    EF         float64  
}  

func Review(card *Card, quality int) {  

    if quality < 3 {  
        card.Repetition = 0  
        card.Interval = 1  
        return  
    }  

    if card.Repetition == 0 {  
        card.Interval = 1  
    } else if card.Repetition == 1 {  
        card.Interval = 6  
    } else {  
        card.Interval = int(float64(card.Interval) * card.EF)  
    }  

    card.Repetition++  

    ef := card.EF + (0.1 - float64(5-quality)*(0.08+float64(5-quality)*0.02))  

    if ef < 1.3 {  
        ef = 1.3  
    }  

    card.EF = ef  
}

with this small piece of logic, you can build the core of a spaced-repetition scheduler.

sm-2 is a nice reminder that powerful systems don’t always require complicated algorithms. sometimes a simple model, applied consistently, is enough to produce surprisingly effective behavior.
in this case, a few lines of math ended up shaping how millions of people learn.

originally published at: https://www.umangsinha.in/blog/the-sm2-algorithm

🚀 Released bitbloom v1.0.0: a high-performance Bloom filter library in Go. In this post, I cover what Bloom filters are, how I built one from scratch, and benchmarks showing 2M+ ops/sec. Space-efficient, blazing-fast. 🔗 Full breakdown inside:

Umang Sinha — Fri, 13 Jun 2025 13:53:49 +0000

Umang Sinha

Jun 13 '25

Probabilistic Data Structures in Go: Building and Benchmarking a Bloom Filter

#go #datastructures #backend #opensource

14 min read

Probabilistic Data Structures in Go: Building and Benchmarking a Bloom Filter

Umang Sinha — Fri, 13 Jun 2025 13:46:34 +0000

Imagine you're building a high-traffic web service that relies heavily on a distributed cache like Redis. For every incoming request, your service first checks the cache to avoid expensive database lookups. Sounds efficient, right?

But here’s a subtle problem at scale: Many of the keys you look up may not be in the cache!

Each of these negative lookups still involves a network round-trip to Redis, which, while fast, adds up when you're processing hundreds of thousands of misses per second.

Wouldn’t it be great if you could avoid those pointless lookups altogether?

This is where a Bloom filter comes in - a clever probabilistic data structure that lives in your application’s memory. It helps you quickly answer the following question:

"Is this key definitely not in the cache?"

If the Bloom filter says “no” → skip the Redis call entirely.

If it says “maybe” → go ahead and check Redis as usual. (It's important to understand that the 'maybe' case can sometimes lead to a false positive, but we'll delve into that later.)

It’s fast, tiny, and wildly effective at reducing unnecessary lookups in high-throughput systems. In fact, services at Google, Facebook, LinkedIn, and CDNs like Cloudflare all use Bloom filters to optimize performance at scale.

In this article, we’ll:

Break down how Bloom filters work (and the math behind them)
Walk through a full implementation in Go
Explore practical use cases, tuning strategies, and limitations

Whether you're optimizing caching logic or reducing database hits - Bloom filters are a tool worth knowing.

What is a Bloom Filter?

A Bloom filter is a probabilistic data structure used to test whether an element is a member of a set. It can return two possible answers:

“Definitely not present”
“Possibly present”

It’s fast, memory-efficient, and beautifully suited for large-scale systems where precision can be traded for performance.

Why “Probabilistic”?

Bloom filters are “probabilistic” because they allow false positives. That is, they might say an element is present even when it’s not. But the beauty is, they never give false negatives. If the filter says something isn't there, you can trust it.

This simple guarantee unlocks powerful optimizations in real-world systems.

A Bloom filter starts with:

A bit array of size m, all initialized to 0
k different hash functions

To add an element:

Hash the element k times
Each hash gives you an index from the bit array
Set each of those k bits to 1

To check for an element:

Hash it the same way
If any of those k bits are 0, the element is definitely not in the set
If all are 1, then it might be present

The more elements you add, the more bits get set, which increases the chances of collisions and hence, false positives.

Real-World Analogy:

Imagine you're trying to create a new Gmail account.

As soon as you enter your desired email address, Google instantly tells you whether it’s available or already taken. That check feels instantaneous, but behind the scenes, it needs to search through hundreds of millions of existing email addresses.

Now, Google could query a database directly every time someone enters an email, but:

That would be costly at scale
It would expose the system to user enumeration risks if someone tries to brute-force existing usernames

So, how do you check availability quickly and securely?

One smart solution is to use a bloom filter.

Google can maintain a bloom filter that contains all existing Gmail usernames, either fully or partially synced across regions.
When you enter a new username, the frontend or a lightweight backend service checks the bloom filter first: If the filter says “definitely not present”, the username is available. If it says “might be present”, Google can then do a deeper, more secure database lookup to confirm.

This two-tiered approach:

Reduces load on internal services
Speeds up user feedback
Avoids leaking user data through timing differences

It’s a perfect example of how Bloom filters trade a small chance of a false positive for dramatic speed and scale benefits, especially in global systems like Google’s.

How Bloom Filters Work Internally:

Bloom filters are elegant data structures that offer a blend of mathematical simplicity, space efficiency, and speed. In this section, we’ll explore how they actually work under the hood.

How to add an item to the bloom filter?

A bloom filter starts with a fixed-size bit array of length m, with all bits initialized to 0.

Each bit in this array represents whether a certain bit has been set by an inserted item.

To insert an item into a bloom filter:

Hash the item using k different hash functions.
Each hash maps the item to a position i in the bit array.
Set the bits at all k positions to 1.

Example: inserting "bloom" with k = 3 might yield indices 1, 4, and 6. After insertion, the bit array looks like this:

How to check membership?

To check if an item might exist in the bloom filter:

Hash the item with the same k hash functions.
Inspect the k positions in the bit array.

If any of the bits is 0 → the item is definitely not in the set.
If all are 1 → the item is possibly in the set.

So, bloom filters give you a guaranteed no or a maybe yes.

If we query "bloom" again, we’ll get the same indices 1, 4, and 6. All are set to 1, so the bloom filter returns true, meaning "bloom" might be present. And in this case, it’s correct.

Now, let’s insert another word - "filter". Suppose its hash functions return indices 4, 5, and 7. The updated bit array becomes:

Notice the collision at index 4? Both "bloom" and "filter" caused that bit to be set. There’s no way to tell whether it was set by "bloom" or "filter". This ambiguity is at the heart of how bloom filters trade precision for speed and space.

False positives:

Since Bloom filters rely on shared bit positions, it’s possible for unrelated items to appear present just by chance.

Imagine querying a word, "maybe" whose hash functions return the indices as 1, 4 and 7.

In our case, "maybe" would pass the membership check because the bits it checks were already set by other words, even though we never inserted "maybe".

This is called a false positive - the filter falsely claims the item might exist.

Bloom filters never produce false negatives, but they can produce false positives.

This tradeoff is exactly what makes Bloom filters powerful: they allow blazing-fast lookups and save memory, in exchange for a small, tunable error rate.

The math behind it:

To understand the false positive rate of a Bloom filter, we need to answer one question: What is the probability that all k hash bits for a queried item are set to 1, even though the item was never inserted?

Let’s break it down step by step.

Step 1: Probability a bit is still 0 after n insertions

Each insertion sets k bits in the array. So, after inserting n items, we’ve made kn bit assignments (some of which might overlap).

Each bit has this probability of remaining unset after one random write:

After kn independent hashings, the probability that a specific bit is still 0 becomes:

Using the identity:

for large m, we can approximate:

Step 2: Probability a bit is 1

So, the probability that a bit is set to 1 is:

Step 3: False Positive Probability

Now, for a false positive to occur, all k bits checked during a query must be 1 (even though the item was never inserted). Assuming the hash functions are independent:

This is the false positive rate - the core metric in bloom filter design.

What Affects the False Positive Rate?

n → number of items inserted
m → size of the bit array
k → number of hash functions

The goal is to choose m and k wisely for a given n so that the false positive rate stays acceptably low.

Optimal bit array size m for target false positive rate p:

There’s even an optimal value of k for given m and n:

You can precompute m and k when initializing your Bloom filter if you know n and your acceptable p. This minimizes the false positive rate.

Asymptotic Complexity of bloom filters:

Despite their probabilistic nature, bloom filters offer impressive performance guarantees. Let’s break them down:

Time complexity:

Both operations involve:

Running k hash functions
Accessing k bits in the bit array

Since k is typically a small constant (e.g., 7-10), both insert and lookup are effectively O(1) in practice.

Space complexity:

A Bloom filter with:

m bits of storage
n inserted elements
k hash functions

Uses a total of O(m) bits of memory.

In summary:

A larger bit array (m) reduces the false positive rate but increases memory usage.
More hash functions (k) improve precision up to a point, but after that, they degrade performance and increase collisions.
The number of inserted elements (n) directly affects accuracy - a bloom filter degrades as it gets “full”.

Bloom filters strike a beautiful balance: tiny space, lightning-fast access, and tunable accuracy.

Implementing a Bloom Filter in Go

There’s no better way to understand Bloom filters than to implement one. Instead of building one from scratch here, we’ll walk through the internals of bitbloom - a fast, thread-safe Bloom filter package I wrote in Go with performance and simplicity in mind.

We’ll dissect the core functionality: initialization, hashing, insertion, lookups, and more, covering not just what the code does, but why it's designed that way. The goal is to showcase how a clean implementation can scale while remaining easy to reason about.

Creating a Bloom Filter: bitbloom.New()

This initializes the filter for 10,000 expected elements and a 1% false positive rate. The library handles all the math for you.

Internally:

Bitset size m and number of hash functions k are computed based on standard formulas.
Values are clamped to safe ranges.
The hash functions are seeded deterministically.

This ensures accuracy without wasting memory or CPU.

Optimal Parameter Calculations:

Bloom filters rely on mathematical precision for tuning space and accuracy:

In bitbloom, these are implemented as:

These help you reason about the internal capacity of your filter and are exported so users can use them manually if needed.

Adding Items: Add(data []byte)

Adding an item to the filter involves hashing the data k times and setting the resulting k positions in the bitset to 1. Here’s the simplified flow:

We lock with a write mutex since the bitset will be mutated.
The bf.hasher.Hashes() function returns k positions using a MurmurHash-based mechanism.
We set those bit positions.
The item count is incremented for optional introspection.

Testing Items for membership: Test(data []byte)

Checking if an item might be in the filter is similar - we compute the same k hash positions and verify that all are set:

We use a read lock for high concurrency.
If any of the positions are unset, we’re certain the item wasn’t added.
If all positions are set, we might have seen the item before - a false positive is possible but bounded by p.

Thread Safety and Internals:

The filter is guarded by a sync.RWMutex, allowing multiple reads but exclusive writes - enabling safe concurrent usage even under heavy load. Here's what the struct looks like:

The hasher interface provides flexibility. The default implementation uses MurmurHash3 for speed and uniformity.

MurmurHash-Based Hashing:

Instead of relying on Go’s built-in hashes (which are not portable or deterministic), bitbloom implements a MurmurHash3-based custom hasher built using the excellent murmur library by Sébastien Paolacci (https://github.com/spaolacci/murmur3).

The interface:

This is both:

Fast: MurmurHash is extremely performant and produces well-distributed hashes.
Deterministic: Same input → same positions → consistent behaviour.

The hash values are derived in a way similar to double hashing, which reduces the cost of generating k distinct hashes from just 2 base hashes.

Concurrency Considerations:

Go encourages writing concurrent programs and bitbloom is designed to embrace that.

Key design points:

sync.RWMutex for safe parallel reads.
Internal locking means you don’t need to wrap access in your own sync.

This allows users to run millions of goroutines hitting the filter simultaneously with minimal contention, as demonstrated in the stress test section later.

Install & Use:

In the next section, we’ll push this filter to the limits with a real-world stress test, hitting millions of ops per second across thousands of goroutines.

Performance & Accuracy Benchmarking:

A Bloom filter is a probabilistic data structure, but it shouldn't be probabilistically slow.

I built bitbloom with performance and concurrency in mind. But how well does it hold up under serious load?

Let’s benchmark the library across three fronts:

Raw throughput (ops/sec under concurrency)
False positive accuracy (does it match the theoretical bound?)
Memory usage

Stress Testing: Ops per Second

I wrote a Go program that:

Spawns 1,000+ goroutines
Each performs thousands of Add() and Test() ops
Shares a single Bloom filter instance
Times the execution and prints throughput

Example outputs on an 8-core machine:

The Bloom filter consistently performs in the 2.2M–2.6M ops/sec range under realistic conditions.

Accuracy Test: False Positive Rate

I inserted 1,000,000 unique items, and then tested 1,000,000 unseen keys to measure the false positive rate:

The result:

Expected result: Around ~1% (since we set p = 0.01)

Observed result: Matches closely across runs

Benchmarking memory usage:

I ran some benchmarks using runtime.ReadMemStats to compare memory usage just after initialization and after a million insertions.

The output:

This tells you:

How much heap is allocated (Alloc)
How much total memory has been used (TotalAlloc)
How much has been reserved by Go (Sys)
How many GC cycles ran

The actual in-use memory (Alloc) stays extremely low (~2.2 MB), which makes sense since:

The Bloom filter’s memory usage depends only on m (size of bitset) and not on number of insertions.
I am using a compact bitset (e.g., m bits ≈ ~1–2 million bits → ~0.25 MB).

100MB TotalAlloc is from temporary allocations during:

Hashing operations
Slice copying
Other per-insertion allocations

Go’s GC is working well - 38 GCs for 1M insertions is not excessive, and memory use is staying bounded.

Real-World Applications:

Bloom filters aren’t just a theoretical curiosity - they’re used extensively in high-performance systems to reduce latency, avoid unnecessary computation, and optimize memory usage. Here are some real-world use cases where Bloom filters shine:

1. Databases and Storage Engines

Many databases use Bloom filters to minimize costly disk lookups. For example, Apache HBase employs Bloom filters at the block level to quickly determine whether a row or column might exist in a file, avoiding unnecessary reads from disk. Similarly, Cassandra and LevelDB rely on Bloom filters to reduce the number of SSTables accessed during queries.

2. Caching Systems

Bloom filters are often used as a read-through cache guard to prevent cache penetration. Suppose a backend cache (like Redis or Memcached) receives frequent requests for keys that don’t exist. A Bloom filter can sit in front and quickly filter out non-existent keys, reducing load on both the cache and the database behind it.

3. Content Delivery Networks (CDNs)

CDNs use Bloom filters at the edge to keep track of recently served content or known malicious URLs. For instance, Google Chrome's Safe Browsing API uses Bloom filters to store hashes of unsafe URLs, allowing browsers to quickly check for potential threats without frequent server queries.

4. Security and Malware Detection

Security systems use Bloom filters to maintain large sets of known bad IPs, malware signatures, or suspicious domains. These can be checked quickly before performing expensive full-pattern matches or fetching threat intelligence data from external services.

5. Distributed Systems and Network Protocols

In distributed architectures, Bloom filters are used to reduce network chatter. A node might send a Bloom filter of its data set to another node so that the receiver can quickly determine which items it’s missing, without transferring full lists. Systems like Apache Hadoop use this pattern to reduce data shuffling during MapReduce jobs.

Limitations and Trade-offs

While Bloom filters are powerful and widely used, they’re not a one-size-fits-all solution. Like any engineering tool, they come with inherent trade-offs that must be carefully considered depending on your use case.

1. No Native Support for Deletion

Standard Bloom filters do not support deletions. Once an item is inserted, its presence cannot be definitively removed without risking the integrity of other items' presence bits. This limitation can be problematic in scenarios where your data set is dynamic and items frequently expire or get removed.

Mitigation:

A common solution is to use a Counting Bloom Filter, which replaces the underlying bit array with an array of counters. Each insertion increments the counters for the k hash positions, and each deletion decrements them. This adds memory overhead but enables safe removals at the cost of increased complexity and potential counter overflows.

2. False Positives Are Inevitable

Bloom filters never produce false negatives, but false positives are always a possibility. That means you might occasionally believe an item exists when it does not.

Choosing the right false positive rate (p):

This is critical and highly application-dependent. A lower false positive rate means:

More memory consumption
More hash functions (slower operations)

Conversely, a higher false positive rate reduces memory usage but increases the chance of incorrect "exists" results. Tuning p involves understanding the cost of false positives in your system. For example:

In a CDN cache, a false positive might mean serving stale content - probably tolerable.
In a security filter, a false positive might block a legitimate user - not acceptable.

3. Choosing n in a Dynamic World

Bloom filters require you to specify the expected number of insertions (n) up front. Underestimating n leads to higher false positive rates; overestimating wastes memory. This is challenging for dynamic workloads or systems where data volume changes over time.

Workaround:

You can use a scalable Bloom filter, which grows as needed by chaining multiple filters with increasing capacity and decreasing false positive targets. However, this adds design complexity.

Bloom filters offer an impressive blend of speed and space efficiency, but they are not magic. A well-engineered system must balance their strengths with their limitations, especially in high-availability or high-accuracy environments.

Conclusion:

Bloom filters are a powerful tool in an engineer’s toolkit - compact, fast, and surprisingly versatile. From preventing unnecessary database hits to guarding high-latency operations, they offer an elegant solution where approximate answers are good enough.

In this article, we explored the theory behind Bloom filters, walked through their internals, and benchmarked their performance across speed, memory, and accuracy. Along the way, we also walked through a fully concurrent Bloom filter in Go.

If you're looking to integrate a production-ready Bloom filter into your Go projects, take a look at my library, bitbloom - the library we walked through earlier. It’s fast, reliable, and designed with performance in mind.

Probabilistic data structures may not give you perfect answers, but in the right contexts, they’ll give you the right answers, fast.

Sources and further reading:

bitbloom on GitHub - The Go library I built and benchmarked in this article. A production-ready Bloom filter implementation with clean API design, fast operations, and concurrency safety (https://pkg.go.dev/github.com/umang-sinha/bitbloom).
Network Applications of Bloom Filters: A Survey - Broder and Mitzenmacher (https://www.eecs.harvard.edu/~michaelm/postscripts/im2005b.pdf)
The Veracious Counting Bloom Filter - Brindha Palanisamy and Senthilkumar Athappan (https://www.iajit.org/portal/PDF/vol.%2014,%20no%206/9285.pdf)

PostgreSQL UUID Performance: Benchmarking Random (v4) and Time-based (v7) UUIDs

Umang Sinha — Fri, 23 May 2025 03:23:16 +0000

Universally Unique Identifiers (UUIDs) are 128-bit values designed to ensure uniqueness across systems, without requiring any central coordination. For UUIDv4, a sample of 3.26×10¹⁶ values has a 99.99% chance of containing no duplicates, thanks to its 122 bits of randomness [source]. This makes them ideal for use as primary keys in a database, particularly in distributed systems.

One of the most widely used UUID formats is UUIDv4, which relies entirely on random number generation. Because they don’t encode any order or time information, UUIDv4s are inherently non-sequential.

This randomness makes them excellent for ensuring uniqueness across nodes, but it also leads to poor index locality in databases like PostgreSQL, especially when used as primary keys. Each insert happens in a random location in the B-tree, which causes frequent page splits and bloated indexes over time.

To address this, the IETF proposed UUIDv7, a time-based format that embeds a millisecond-resolution Unix timestamp in the high-order bits.

This results in UUIDs that retain uniqueness while also being roughly monotonically increasing, making them far more index-friendly. UUIDv7 retains global uniqueness while offering better performance characteristics for time-ordered inserts and queries in databases like PostgreSQL.

But does UUIDv7 actually perform better in practice, particularly in PostgreSQL?

In this article, we'll benchmark UUIDv4 and UUIDv7 in PostgreSQL by comparing their insert speeds, index sizes, and query performance. We'll dig into how the structure of UUIDs impacts B-tree behavior, and whether switching to UUIDv7 is worth it for modern applications.

UUID Versions Explained:

UUIDs are typically represented as 36-character hexadecimal strings with hyphens. Despite their compact string appearance, they carry structured meaning depending on the version.

A UUID is split into five parts:

M is the version (e.g., 4 for UUIDv4, 7 for UUIDv7).
N encodes the variant (usually 10xx for RFC 4122 compliant UUIDs).
The rest is either random or encodes time/data, depending on the version.

UUIDv4: Random

UUIDv4 is the most commonly used version. It sets only two fields:

Version = 4 (in the 13th hex digit).
Variant = 10xx (in the 17th hex digit).

Everything else is pure randomness. This ensures high entropy but results in non-sequential values.

Downside: Poor locality in B-tree indexes due to randomness.

UUIDv7: Time-based

UUIDv7 was introduced to improve temporal ordering and index performance. It uses the high bits to encode a Unix timestamp in milliseconds, while the remaining bits are random to preserve uniqueness.

Bit layout of UUIDv7:

Benefit: Maintains insertion order in databases, improving index locality and reducing write amplification.

Why Key Locality Matters in PostgreSQL:

Choosing the right primary key doesn’t just influence how your data is uniquely identified, it also has a profound impact on how efficiently that data is stored, indexed, and retrieved. One often-overlooked consideration is how your key choice affects data locality and write performance within the database engine.

PostgreSQL, like many relational databases, uses B-tree indexes to organize and access primary key values. These indexes store keys in sorted order, making them highly efficient for range queries and lookups, but also sensitive to the order in which keys are inserted.

How B-tree Indexes Work in PostgreSQL:

A B-tree in PostgreSQL is made up of fixed-size pages, usually 8 KB in size, that hold sorted key-value entries. When a new row is inserted into a table with a B-tree-indexed primary key, PostgreSQL traverses the tree to find the appropriate page where the new key belongs. If the target page has space, the new entry is inserted directly. But if the page is full, PostgreSQL splits it into two pages: one holding the lower half of the entries, and the other the upper half. The tree is then updated to reflect this structural change.

Page splits are not just computationally expensive, but they also result in additional I/O, increased write amplification, and potential index bloat. Over time, a heavily fragmented index becomes slower to write to and less efficient to read from.

Why Random UUIDs (v4) Hurt Performance:

UUIDv4 is popular for primary keys because it provides excellent randomness and extremely low collision risk. However, this randomness comes at a cost.

Because UUIDv4 values are entirely random, new entries are inserted into arbitrary positions in the B-tree. PostgreSQL cannot make any assumptions about where the next UUID will fall in the keyspace and hence every insert effectively becomes a random-access write. This behaviour leads to frequent page splits as new keys collide with existing ones across the tree.

Over time, this causes the index to bloat, increases write amplification, and reduces the effectiveness of caching, since recently used index pages are unlikely to be reused soon. Additionally, queries that rely on ordered traversal, such as ORDER BY id DESC or cursor-based pagination using WHERE id > ? suffer from poor performance because the data is scattered non-sequentially throughout the tree.

Why UUIDv7 Fixes This:

UUIDv7 was introduced to solve this very problem. It embeds a 48-bit Unix timestamp (in milliseconds) into the most significant bits of the UUID, resulting in values that are roughly time-ordered.

This means that UUIDv7 values are monotonically increasing over time, which dramatically improves index locality. As new records are inserted, their UUIDv7 keys tend to fall at the end of the B-tree. This significantly reduces the likelihood of page splits, minimizes fragmentation, and allows PostgreSQL to optimize for sequential writes.

Because of this time-based structure, UUIDv7 provides behavior similar to that of auto-incrementing integers, but without sacrificing the global uniqueness and decentralization benefits that UUIDs offer. The timestamp ensures order, while the random bits in the lower portion of the UUID maintain uniqueness even across distributed systems.

In practice, systems using UUIDv7 as a primary key observe lower write amplification, reduced disk I/O, and faster performance for queries that involve ordered traversal or cursor-based pagination. The B-tree remains compact and more predictable, which also improves performance under high write loads or concurrent inserts.

While UUIDv4 excels in uniqueness, UUIDv7 offers a practical compromise, retaining uniqueness while gaining the efficiency of ordered inserts.

In summary:

Experiment Setup:

To evaluate the practical impact of UUIDv4 vs UUIDv7 on PostgreSQL performance, we will run benchmarks using identical table structures, data and insertion logic. The only variable change will be the UUID version used for the primary key.

a. Database Configuration:

I will be using PostgreSQL 16 for the benchmark, hosted locally inside a docker container on a system with:

CPU: 8 cores
Memory: 16 GB RAM
Disk: NVMe SSD
Extensions: None required, since UUIDs will be generated client-side using Go.

I will be using pgAdmin 4 to run any SQL queries against the database.

b. Table Schema:

Two tables are created with the exact same structure. Only the key generation strategy differs.

Each row will have a UUID key and a small random string in the payload column to simulate realistic row sizes.

c. UUID Generation:

To eliminate any bias in benchmarking, both UUIDv4 and UUIDv7 values will be generated using the same Go script, in memory, before the insert operation starts. This allows us to isolate and measure only the time taken by the database to perform inserts.

UUIDv4: generated using github.com/google/uuid
UUIDv7: generated using github.com/samborkent/uuidv7

This ensures:

No bias from generation latency during insert timing
Uniform client-side CPU and memory usage
Identical batching and transaction logic for inserts

We will pre-generate the full dataset (UUID + payload) in slices of structs, and measure only the database insertion time, excluding UUID and payload generation from the timing. The insertions will be performed using parameterized queries in batches (e.g. 10,000 rows per batch) using database/sql.

d. Go script responsibilities:

The Go benchmark script will:

Generate 10 million UUIDs of each type (v4 and v7)
Pair each UUID with a random payload string
Store the entries in memory
Insert the entries into the respective tables in batches

This setup ensures we're isolating the effect of UUID key locality on B-tree index behaviour without being skewed by unrelated overhead.

The Benchmarking Script:

To isolate and accurately measure the impact of UUID version on insert and query performance, we will write a Go benchmarking script that:

Generates 10 million UUIDs and payloads in memory
Times only the insert phase, excluding UUID generation and payload creation time.
Performs batch inserts using PostgreSQL’s pq driver

a. Dependencies:

We will use the following Go packages:

b. UUID and Payload Generation:

Before benchmarking inserts, we generate all UUIDs and payloads in memory:

c. Insert Logic:

We use PostgreSQL's pq driver to perform batch inserts of size 10,000 rows:

d. Execution:

Note:

Both UUIDs and payloads are generated before timing begins to ensure we are measuring only database performance
Batching improves performance and mirrors how real-world services insert data at scale.

Benchmark Execution:

Before diving into raw performance numbers, I would love to demonstrate a key property of UUIDv7 - monotonicity.

Unlike UUIDv4 (which is completely random), UUIDv7 is designed to be time-ordered, embedding the current Unix timestamp (in milliseconds) into the most significant bits of the UUID. This allows for natural sortability, better index locality, and potential performance advantages for write-heavy workloads.

Here’s a set of UUIDv7 values I generated in Go, pausing for 1 millisecond between each call:

If you observe closely, the hexadecimal digits in the second segment of each UUID (after the first hyphen) are gradually increasing:

2f2d → 2f2e → 2f30 → 2f31 → … → 2f37

This confirms that UUIDv7 values preserve insertion order, which should result in fewer B-tree page splits in PostgreSQL and better index write locality - a hypothesis we will validate in the benchmarks below.
Insert Performance:

I inserted 10 million rows into each table using batched inserts (10,000 rows per batch), with UUIDs and payloads pre-generated in memory to ensure the measurement reflects only database insertion time.

Insert Performance:

Analysis:

i. UUIDv7 inserts were ~34.8% faster than UUIDv4 inserts.

ii. The performance gain is due to UUIDv7’s monotonic nature, which improves B-tree index locality:

UUIDv4 inserts scatter randomly across the index, causing frequent page splits and higher I/O overhead.
UUIDv7 inserts append in order, minimizing page splits and promoting sequential writes within index pages.

iii. This performance improvement becomes more pronounced as the table grows and the B-tree index gets deeper.

In a high-insert workload (like logs, events, or user activity tracking), switching from UUIDv4 to UUIDv7 can yield tangible write performance benefits.

Disk Usage:

To assess how the UUID type affects storage footprint, I measured the total relation size (table + index) using:

Analysis:

i. The UUIDv7 table uses ~175 MB less disk space than UUIDv4, despite having the same number of rows and exactly same schema.

ii. This can be attributed to:

Index locality: UUIDv7s are monotonically increasing, leading to sequential inserts and more compact B-tree indexes.
Fewer page splits and better fill factor due to reduced randomness in the index keys.

iii. UUIDv4, being completely random, causes heavier index fragmentation, leading to larger storage usage.

This highlights that UUIDv7 not only improves insert performance but is also more storage-efficient, especially at scale.

Index Size:

In addition to measuring the total disk usage, I also analyzed the disk footprint of the primary key indexes. Since both tables use a UUID PRIMARY KEY, PostgreSQL automatically creates a B-tree index on the id column.

I queried the size of the index alone using the following query:

Analysis:

i. The index built on UUIDv7 is 174 MB smaller than the one on UUIDv4.

ii. This translates to a ~22% reduction in index size.

iii. The difference is a direct result of UUIDv7's monotonic nature, which provides:

Improved index locality
Fewer B-tree page splits
Tighter physical clustering of keys
Better cache utilization

Smaller indexes improve read performance, particularly for range scans and point lookups.

They also reduce I/O pressure, making UUIDv7 a better choice for write-heavy and read-latency-sensitive workloads at scale.

Query Performance:

I measured point lookup and range scan performance for both UUIDv4 and UUIDv7 using the following queries:

Point Lookup:

Analysis:

UUIDv7 has significantly lower planning and execution times than UUIDv4.
UUIDv7's monotonically increasing nature improves the index's locality, leading to faster lookups.

Range scan:

Analysis:

While UUIDv7 takes slightly more time during planning, its execution time is much faster.
The sequential nature of UUIDv7 reduces index fragmentation, providing quicker access to sequential data, thus improving range scan performance.

UUIDv7 outperforms UUIDv4 in both point lookups and range scans, with lower execution times, thanks to its monotonic sequence.

The lower disk usage and faster query performance make UUIDv7 a more efficient choice for databases, especially when querying large datasets.

Practical Considerations:

While UUIDv7 clearly demonstrates performance and storage advantages, choosing it in production should still account for a few practical factors:

Pros of UUIDv7:

Monotonicity = Speed: Writes are faster due to better index locality and reduced page splits.
Smaller Indexes: Less disk space, better cache efficiency.
Faster Range Queries: Naturally sortable and ideal for time-ordered data (e.g., logs, events, timelines).
Globally Unique + Time Encoded: You get the benefits of a UUID plus implicit timestamping.

Caveats:

Tooling & Compatibility: Some older systems, libraries, or languages may not support UUIDv7 yet.
Randomness & Privacy: UUIDv7 includes a timestamp. If your use case demands anonymity or unpredictability, consider this a tradeoff.
Availability in Libraries: While UUIDv4 is standard and widely supported, UUIDv7 still requires third-party packages in many ecosystems.

Conclusion:

This benchmark set out to answer a simple question: “Is UUIDv7 actually better than UUIDv4 in PostgreSQL?”

The results speak for themselves:

In summary:

UUIDv7 not only preserves global uniqueness but also enhances PostgreSQL performance in meaningful ways.

If you're building systems that scale, especially write-heavy ones, it's a very strong candidate.

All code used in this benchmark, including UUID generation, PostgreSQL schema, and Go benchmarking logic is available here:

https://github.com/umang-sinha/postgres-uuid-benchmark

Feel free to fork, run, or modify it for your own experiments!

Sources and further reading:

Why UUIDv7 is Revolutionizing Time-Ordered Identifiers - https://corner.buka.sh/why-uuidv7-is-revolutionizing-time-ordered-identifiers-for-modern-systems/
UUIDs Are Bad for Database Index Performance - Enter UUIDv7! - https://www.toomanyafterthoughts.com/uuids-are-bad-for-database-index-performance-uuid7/
Unexpected Downsides of UUID Keys in PostgreSQL - https://www.cybertec-postgresql.com/en/unexpected-downsides-of-uuid-keys-in-postgresql/
How PostgreSQL Indexes Can Negatively Impact Performance - https://www.percona.com/blog/postgresql-indexes-can-hurt-you-negative-effects-and-the-costs-involved/
Benchmarking UUIDv4 vs UUIDv7 in PostgreSQL - https://mblum.me/posts/pg-uuidv7-benchmark/

Beyond JavaScript - Why 0.1 + 0.2 doesn't equal 0.3 in programming

Umang Sinha — Fri, 13 Sep 2024 13:44:48 +0000

JavaScript is frequently ridiculed when developers first encounter this seemingly baffling result:

0.1 + 0.2 == 0.30000000000000004

Memes about JavaScript's handling of numbers are widespread, often leading many to believe that this behaviour is unique to the language.

However, this quirk isn't just limited to JavaScript. It is a consequence of how most programming languages handle floating-point arithmetic.

For instance, here are code snippets from Java and Go that produce similar results:

Computers can natively only store integers. They don't understand fractions. (How will they? The only way computers can do arithmetic is by turning some lights on or off. The light can either be on or off. It can't be "half" on!) They need some way of representing floating point numbers. Since this representation is not perfectly accurate, more often than not, 0.1 + 0.2 does not equal 0.3.

All fractions whose denominators are made of prime factors of the number system's base can be cleanly expressed while any other fractions would have repeating decimals. For example, in the number system with base 10, fractions like 1/2, 1/4, 1/5, 1/10 are cleanly represented because the denominators in each case are made up of 2 or 5 - the prime factors of 10. However, fractions like 1/3, 1/6, 1/7 all have recurring decimals.

Similarly, in the binary system fractions like 1/2, 1/4, 1/8 are cleanly expressed while all other fractions have recurring decimals. When you perform arithmetic on these recurring decimals, you end up with leftovers which carry over when you convert the computer's binary representation of numbers to a human readable base-10 representation. This is what leads to approximately correct results.

Now that we've established that this problem is not exclusive to JavaScript, let's explore how floating-point numbers are represented and processed under the hood to understand why this behaviour occurs.

In order to understand how floating point numbers are represented and processed under the hood, we would first have to understand the IEEE 754 floating point standard.

IEEE 754 standard is a widely used specification for representing and performing arithmetic on floating-point numbers in computer systems. It was created to guarantee consistency when using floating-point arithmetic on various computing platforms. Most programming languages and hardware implementations (CPUs, GPUs, etc.) adhere to this standard.

This is how a number is denoted in IEEE 754 format:

Here s is the sign bit (0 for positive, 1 for negative), M is the mantissa (holds the digits of the number) and E is the exponent which determines the scale of the number.

You would not be able to find any integer values for M and E that can exactly represent numbers like 0.1, 0.2 or 0.3 in this format. We can only pick values for M and E that give the closest result.

Here is a tool you could use to determine the IEEE 754 notations of decimal numbers: https://www.h-schmidt.net/FloatConverter/IEEE754.html

IEEE 754 notation of 0.25:

IEEE 754 notation of 0.1 and 0.2 respectively:

Please note that the error due to conversion in case of 0.25 was 0, while 0.1 and 0.2 had non-zero errors.

IEEE 754 defines the following formats for representing floating-point numbers:

Single-precision (32-bit): 1 bit for sign, 8 bits for exponent, 23 bits for mantissa
Double-precision (64-bit): 1 bit for sign, 11 bits for exponent, 52 bits for mantissa

For the sake of simplicity, let us consider the single-precision format that uses 32 bits.

The 32 bit representation of 0.1 is:

0 01111011 10011001100110011001101

Here the first bit represents the sign (0 which means positive in this case), the next 8 bits (01111011) represent the exponent and the final 23 bits (10011001100110011001101) represent the mantissa.

This is not an exact representation. It represents ≈ 0.100000001490116119384765625

Similarly, the 32 bit representation of 0.2 is:

0 01111100 10011001100110011001101

This is not an exact representation either. It represents ≈ 0.20000000298023223876953125

When added, this results in:

0 01111101 11001101010011001100110

which is ≈ 0.30000001192092896 in decimal representation.

In conclusion, the seemingly perplexing result of 0.1 + 0.2 not yielding 0.3 is not an anomaly specific to JavaScript, but a consequence of the limitations of floating-point arithmetic across programming languages. The roots of this behaviour lie in the binary representation of numbers, which inherently leads to precision errors when handling certain fractions.

The CORS Conundrum

Umang Sinha — Sun, 11 Feb 2024 02:27:44 +0000

If you're a back end developer you must have been in a position where the API you wrote worked perfectly fine when tested with Postman, cURL or any other API testing tool but as soon as the frontend application started consuming your API, the following much dreaded error started appearing:

If you've been there, this article is for you.

We will dive deep into CORS and explore what it is, why it is needed and how to deal with it.

What is CORS?

According to MDN Web Docs, "Cross-Origin Resource Sharing (CORS) is an HTTP-header based mechanism that allows a server to indicate any origins (domain, scheme, or port) other than its own from which a browser should permit loading resources."

If that didn't make a lot of sense to you, here's a diagram to simplify things:

When a website hosted at xyz.com sends requests to a web server also hosted at xyz.com (same domain, protocol and port), the request is a 'same-origin request'. These requests are generally allowed and have fewer restrictions. However, they are still subject to security mechanisms such as the Same Origin Policy (SOP).

When a website hosted at xyz.com sends requests to a web server hosted at abc.com, the request is a 'cross-origin request'. By default, web browsers restrict cross-origin requests to prevent unauthorized access to sensitive data or resources. However, there are mechanisms such as Cross-Origin Resource Sharing (CORS) that allow servers to explicitly authorize cross-origin requests from specific origins.

If the website hosted at xyz.com wants to fetch data from an API hosted at abc.com, the server at abc.com needs to include CORS headers in its response to allow requests from xyz.com.

Since the browser is the one who restricts cross-origin resource sharing, the API you built works in Postman, cURL and other API testing tools but not in the browser.

Why do browsers restrict cross-origin resource sharing?

Let us assume the following three scenarios:

Person A gets tricked into clicking on a specially crafted link that they received over email or found embedded on a malicious website. Person A was logged into their bank account in the same browser session thereby allowing the malicious request to be executed and transferring funds from A's account to the attacker's account without A's knowledge.
The attacker crafts a request to change the password of the victim's account on a web service. Person B is then lured into clicking a link embedded in a phishing email or disguised as a legitimate action. If Person B is logged into the targeted web service in the same browser session, the malicious request gets executed, thereby changing the password of the victim's account.
Person C clicks on a phishing link that they received. Person C was also logged into their email account in the same browser session. The malicious request is thus executed and email forwarding gets configured without Person C's knowledge. All the incoming emails from Person C's email now start being forwarded to the attacker's email.

All the above mentioned scenarios are examples of CSRF (Cross-Site Request Forgery) attacks that could have been avoided if the server had CORS configured. Browsers restrict CORS to enforce SOP and mitigate security risks such as unauthorized access and CSRF attacks. By implementing proper CORS policies, web developers can control access to resources on their servers and ensure that sensitive operations are only allowed from trusted origins, thereby enhancing the overall security of their web applications.

So, how are CORS policies implemented on the server-side?

To demonstrate this I will be using a simple server I built using ExpressJS. The server supports three endpoints:

/add-student - takes id and name as inputs and adds a new student
/delete-student - takes id as input and deletes the student with that id
/get-student - takes id as input and fetches the student

There is also a simple frontend that we will be using to send these API requests. It was made using plain HTML and it looks something like this:

Here's the code for the client:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Student</title>
  </head>
  <body>
    <input type="text" id="textField1" placeholder="Student ID" />
    <input type="text" id="textField2" placeholder="Student name" />
    <button onclick="addStudent()">Add student</button><br /><br />

    <input type="text" id="textField3" placeholder="Student ID to delete" />
    <button onclick="deleteStudent()">Delete student</button><br /><br />

    <input type="text" id="textField4" placeholder="Get student" />
    <button onclick="getStudent()">Get student</button><br /><br />

    <script>
      let apiUrl = "http://127.0.0.1:3000";

      function addStudent() {
        let studentId = document.getElementById("textField1").value;
        let studentName = document.getElementById("textField2").value;

        fetch(apiUrl + "/add-student", {
          method: "POST",
          headers: {
            "Content-Type": "application/json",
          },
          body: JSON.stringify({ name: studentName, id: studentId }),
        }).then((response) => {
          if (response.status === 200) {
            console.log(response);
          } else {
            const div = document.createElement("div");
            div.innerText = "Something went wrong";
            document.body.appendChild(div);
          }
        });
      }

      function deleteStudent() {
        let idToDelete = document.getElementById("textField3").value;

        fetch(apiUrl + "/delete-student", {
          method: "DELETE",
          headers: {
            "Content-Type": "application/json",
          },
          body: JSON.stringify({ id: idToDelete }),
        }).then((response) => {
          if (response.status === 200) {
            console.log(response);
          } else {
            const div = document.createElement("div");
            div.innerText = "Something went wrong";
            document.body.appendChild(div);
          }
        });
      }

      function getStudent() {
        let idToSearch = document.getElementById("textField4").value;
        fetch(apiUrl + "/get-student" + "?id=" + idToSearch, {
          method: "GET",
          headers: {
            "Content-Type": "application/json",
          },
        }).then((response) => {
          if (response.status === 200) {
            console.log(response);
          } else {
            const div = document.createElement("div");
            div.innerText = "Something went wrong";
            document.body.appendChild(div);
          }
        });
      }
    </script>
  </body>
</html>

And the server:

const express = require("express");

const app = express();
const port = 3000;

app.use(express.json());
app.use(express.urlencoded({ extended: false }));

let students = [];

app.get("/get-student", (req, res) => {
  const idToSearch = req.query?.id;
  for (let i = 0; i < students.length; i++) {
    if (students[i]["id"] == idToSearch) {
      return res.status(200).send({ student: students[i] });
    }
  }
  return res.status(404).send("Not found");
});

app.post("/add-student", (req, res) => {
  const id = req.body?.id;
  const name = req.body?.name;
  students.push({ id, name });
  return res.status(200).send({ students, message: "Successfully added" });
});

app.delete("/delete-student", (req, res) => {
  const idToDelete = req.body?.id;
  for (let i = 0; i < students.length; i++) {
    if (students[i]["id"] == idToDelete) {
      students.splice(i, 1);
      return res.status(200).send({ students });
    }
  }
  return res.status(404).send("Not found");
});

app.listen(port, () => {
  console.log(`Server listening on port ${port}`);
});

And another simple server that serves the HTML file:

const express = require("express");

const app = express();
const port = 4000;

app.get("/", function (req, res) {
  res.sendFile(__dirname + "/index.html");
});

app.listen(port, () => {
  console.log(`Server listening on port ${port}`);
});

The server is listening on port 3000 while the HTML file is being served at port 4000. Thus, any request sent from the client to the server in this case would be a Cross-Origin Request.

If we now try to add a student with id 101 and name Alex we get the much dreaded CORS error as expected:

We did not configure CORS on our backend. The browser tried to send a preflight request to the server and did not receive appropriate headers in the response.

The browser first sends a preflight request to the server to determine if the actual request is safe to send. This preflight request is an HTTP OPTIONS request that includes specific headers, such as Origin, Access-Control-Request-Method, and Access-Control-Request-Headers. The server must respond to the preflight request with appropriate CORS headers indicating whether the actual request is allowed. These headers include Access-Control-Allow-Origin, Access-Control-Allow-Methods, Access-Control-Allow-Headers, and others.

Only after the browser receives a satisfactory response to the preflight request will it send the actual request (e.g., GET, POST, etc.). If the preflight request fails or if the server does not respond with the required CORS headers, the browser will block the actual request, preventing potential cross-origin security vulnerabilities.

Let us now create a middleware that will add the required CORS headers to the API response.

// CORS middleware
app.use((req, res, next) => {
  res.header("Access-Control-Allow-Origin", "http://127.0.0.1:4000");
  res.header(
    "Access-Control-Allow-Headers",
    "Origin, X-Requested-With, Content-Type, Accept"
  );
  // Allow specific methods
  res.header("Access-Control-Allow-Methods", "GET, POST, PUT, DELETE, OPTIONS");
  next();
});

The server code finally looks like this:

const express = require("express");

const app = express();
const port = 3000;

app.use(express.json());
app.use(express.urlencoded({ extended: false }));

let students = [];

// CORS middleware
app.use((req, res, next) => {
  res.header("Access-Control-Allow-Origin", "http://127.0.0.1:4000");
  res.header(
    "Access-Control-Allow-Headers",
    "Origin, X-Requested-With, Content-Type, Accept"
  );
  // Allow specific methods
  res.header("Access-Control-Allow-Methods", "GET, POST, PUT, DELETE, OPTIONS");
  next();
});

app.get("/get-student", (req, res) => {
  const idToSearch = req.query?.id;
  for (let i = 0; i < students.length; i++) {
    if (students[i]["id"] == idToSearch) {
      return res.status(200).send({ student: students[i] });
    }
  }
  return res.status(404).send("Not found");
});

app.post("/add-student", (req, res) => {
  const id = req.body?.id;
  const name = req.body?.name;
  students.push({ id, name });
  return res.status(200).send({ students, message: "Successfully added" });
});

app.delete("/delete-student", (req, res) => {
  const idToDelete = req.body?.id;
  for (let i = 0; i < students.length; i++) {
    if (students[i]["id"] == idToDelete) {
      students.splice(i, 1);
      return res.status(200).send({ students });
    }
  }
  return res.status(404).send("Not found");
});

app.listen(port, () => {
  console.log(`Server listening on port ${port}`);
});

Restart the server and try adding the student again. Now it works!

The preflight request received a response with all the required CORS headers this time and thus the browser sent the actual request.

The CORS Headers

In the middleware, we set the following three headers:

Access-Control-Allow-Origin
Access-Control-Allow-Headers
Access-Control-Allow-Methods

Let us look at each of these headers in detail:

Access-Control-Allow-Origin:

If the server includes Access-Control-Allow-Origin: * in the response header, it allows requests from any origin. This is not a good practice generally and is considered a security risk unless you are intentionally building a public API that should be accessible from any origin. It effectively disables the Same-Origin Policy, which is designed to protect against attacks, such as Cross-Site Request Forgery (CSRF).
If the server includes Access-Control-Allow-Origin: <origin> in the response header, it allows requests only from the specified origin (it was set to http://127.0.0.1:4000 in the above example).
If the server does not include the Access-Control-Allow-Origin header in the response (or includes it with a different origin), the browser will block the request due to the Same-Origin Policy.

Access-Control-Allow-Headers:

Along with the preflight request, the browser includes the Access-Control-Request-Headers header, which lists the headers that the client wants to include in the actual request.
The server then responds to the preflight request, and if it allows the requested headers, it includes the Access-Control-Allow-Headers header in the response. This header contains a comma-separated list of the headers that the server allows (we set this header to "Origin, X-Requested-With, Content-Type, Accept" in the above example).

Access-Control-Allow-Methods:

In the preflight request, the browser includes the Access-Control-Request-Method header, which specifies the method that the client wants to use in the actual request. The server then responds to the preflight request, and if it allows the requested method, it includes the Access-Control-Allow-Methods header in the response. This header contains a comma-separated list of the HTTP methods that the server allows (we set this header to "GET, POST, PUT, DELETE, OPTIONS" in the above example)

Conclusion:

If you feel this dive into CORS wasn't deep enough or want to explore further, MDN Web Docs would be the best place to continue reading: MDN Web Docs

The code used to demonstrate CORS in this article can be found here: GitHub

I hope this article helped you understand what CORS is and how you can deal with it the next time you see "CORS Error" flashing on your screen!

How to fetch data from REST APIs in Flutter? 💻

Umang Sinha — Sat, 19 Jun 2021 20:06:08 +0000

I remember being stuck with REST APIs when I was new to Flutter and programming in general. As a beginner I didn't know where to find the solutions to my problem. I was often advised to read the official documentation but those docs always looked very intimidating to me as a complete beginner. That is when I stumbled upon this beautiful community of people on the internet that are always ready to help out. After having gained so much I guess it's time to give back to this gorgeous community and that is why I am writing my first ever blog post 🤩

In this article, we'll try to fetch some dummy data from a REST API hosted by Reqres.in

Before we begin, add the following code into the main.dart file of your flutter app:

After you are done with this, your app should look something like this:

Before we can make our first HTTP request we need to install some packages. You can now head over to pub.dev and search for 'http'. The package that we are looking for is this.

In order to install this package you can follow the below mentioned steps:

Run the following command in your terminal:

$ flutter pub add http

After you have done this your IDE will run the flutter pub get command. In case it doesn't, you can manually do it by typing it into your terminal.

The http package has now been installed. In order to access it, we can import it as a library by adding the following line of code to the top of our main.dart file:

import 'package:http/http.dart' as http;

Now that everything is setup, we can start accessing the http library and use it to send HTTP requests to the REST API. Let's get coding! 🚀

Our next step would be to create a function that will fetch the data from the REST API and print it to the console. In order to keep things simple, let us name the function getData() which is exactly what it does - it gets data! This function will be an asynchronous function and will have the return type of Future<String>.

Wait what? What the heck is Future? Shouldn't the return type be just String?

Does that look weird to you? Don't worry! 🤝 I felt the same when I saw it for the first time. Let us try to understand it:

Future<String> can be thought of as a promise token that doesn't have any data right now but promises to provide a String in the future. A Future can have two possible states: Uncompleted and Complete. The Future is in the Uncompleted state when it doesn't yet have the data that it promised to provide.

Inside the function, let us declare a final variable that stores the URL.

final url = Uri.parse('https://reqres.in/api/users?page=2');

We now send an HTTP request to the REST API with the help of the get() method that the http package offers and store it in a variable called response. This response will be of the type http.Response. We also specify in the header of our request that we want to receive a response in the JSON format.

http.Response response = await http.get(
      url,
      headers: {
        'Accept' : 'application/json'
      });

We can now print the response body to the console!

print(response.body);

The getData function should finally look like this:

There you go! We should now be able to receive data from the REST API and print it to the console. Just pass the getData function to the onPressed listener of the button and reload your app.

Now press the button on your app and voila! 🥳 Your console will print out some data from the REST API like this:

Let's connect 👇
GitHub LinkedIn