DEV Community: stringintech

Fuzzing Bitcoin Core using AFL++ on Apple Silicon

stringintech — Sun, 12 Oct 2025 11:25:56 +0000

Steps to build and run a Bitcoin Core fuzz target on Apple Silicon using AFL++ (tested on M1 and M3):

Installation

First, install AFL++ via Homebrew:

brew install afl++

This comes with the AFL++ instrumentation compiler afl-clang-fast++ (located in /opt/homebrew/opt/afl++/bin) and LLVM from Homebrew (/opt/homebrew/Cellar/llvm). Note that afl-clang-lto is not available on macOS.

Note: afl-clang-fast++ is a wrapper around Homebrew's LLVM clang compiler (/opt/homebrew/Cellar/llvm/<VERSION>/bin/clang++). When you compile with it, it produces binaries with the required runtime instrumentation that enables the AFL++ fuzz engine to track code coverage and guide fuzzing.

Configuration

Configure the fuzz build using afl-clang-fast (replacing afl-clang-lto in the command mentioned in Bitcoin Core's fuzzing docs):

cmake -B build_fuzz \
    -DCMAKE_C_COMPILER="/opt/homebrew/bin/afl-clang-fast" \
    -DCMAKE_CXX_COMPILER="/opt/homebrew/bin/afl-clang-fast++" \
    -DBUILD_FOR_FUZZING=ON

Troubleshooting LLVM Issues

However, you may run into the same issue I did after attempting to build with cmake --build build_fuzz:

Undefined symbols for architecture arm64:
  "std::__1::__hash_memory(void const*, unsigned long)", referenced from:
      Arena::Arena(void*, unsigned long, unsigned long) in libbitcoin_util.a[32](lockedpool.cpp.o)
      ...
ld: symbol(s) not found for architecture arm64

This is caused by a mismatch: afl-clang-fast++ compiles with Homebrew's LLVM headers (which declare __hash_memory), but the linker defaults to Apple's older system libc++ (which lacks this symbol).

Fix by explicitly using Homebrew's LLVM for both compilation and linking:

cmake -B build_fuzz \
    -DCMAKE_C_COMPILER="/opt/homebrew/bin/afl-clang-fast" \
    -DCMAKE_CXX_COMPILER="/opt/homebrew/bin/afl-clang-fast++" \
    -DBUILD_FOR_FUZZING=ON \
    -DCMAKE_CXX_FLAGS="-stdlib=libc++ -I/opt/homebrew/Cellar/llvm/<YOUR_VERSION>/include/c++/v1" \
    -DCMAKE_EXE_LINKER_FLAGS="-L/opt/homebrew/Cellar/llvm/<YOUR_VERSION>/lib/c++ -lc++ -lc++abi"

Replace <YOUR_VERSION> with your installed LLVM version (e.g., 21.1.2).

Now build:

cmake --build build_fuzz

Running the Fuzzer

To run a specific fuzz target (I'm running the cmpctblock target introduced in PR#33300):

Create input and output directories:

mkdir -p fuzz-inputs/ fuzz-outputs/

Generate initial test input:

head -c 1000 /dev/urandom > fuzz-inputs/input.dat

Configure system for AFL++ (may require sudo):

sudo afl-system-config

Start fuzzing:

FUZZ=cmpctblock afl-fuzz -i fuzz-inputs -o fuzz-outputs -- build_fuzz/bin/fuzz

You should see AFL++ running successfully:

american fuzzy lop ++4.33c {default} (build_fuzz/bin/fuzz) [explore]
┌─ process timing ────────────────────────────────────┬─ overall results ────┐
│        run time : 0 days, 0 hrs, 0 min, 20 sec      │  cycles done : 0     │
│   last new find : none seen yet                     │ corpus count : 1     │
│last saved crash : none seen yet                     │saved crashes : 0     │
│ last saved hang : none seen yet                     │  saved hangs : 0     │
├─ cycle progress ─────────────────────┬─ map coverage┴──────────────────────┤
│  now processing : 0.0 (0.0%)         │    map density : 1.45% / 1.47%      │
│  runs timed out : 0 (0.00%)          │ count coverage : 1.11 bits/tuple    │
├─ stage progress ─────────────────────┼─ findings in depth ─────────────────┤
│  now trying : trim 4/4               │ favored items : 1 (100.00%)         │
│ stage execs : 87/250 (34.80%)        │  new edges on : 1 (100.00%)         │
│ total execs : 332                    │ total crashes : 0 (0 saved)         │
│  exec speed : 10.16/sec (zzzz...)    │  total tmouts : 0 (0 saved)         │
├─ fuzzing strategy yields ────────────┴─────────────┬─ item geometry ───────┤
│   bit flips : 0/0, 0/0, 0/0                        │    levels : 1         │
│  byte flips : 0/0, 0/0, 0/0                        │   pending : 1         │
│ arithmetics : 0/0, 0/0, 0/0                        │  pend fav : 1         │
│  known ints : 0/0, 0/0, 0/0                        │ own finds : 0         │
│  dictionary : 0/0, 0/0, 0/0, 0/0                   │  imported : 0         │
│havoc/splice : 0/0, 0/0                             │ stability : 99.08%    │
│py/custom/rq : unused, unused, unused, unused       ├───────────────────────┘
│    trim/eff : n/a, n/a                             │             [cpu: 24%]
└─ strategy: explore ────────── state: started :-) ──┘

Using AFL_DEBUG and AFL_NO_UI environment variables provides debug logs in a more readable format for troubleshooting:

FUZZ=cmpctblock AFL_DEBUG=1 AFL_NO_UI=1 afl-fuzz -i fuzz-inputs -o fuzz-outputs -- build_fuzz/bin/fuzz

[+] Enabled environment variable AFL_DEBUG with value 1
[+] Enabled environment variable AFL_DEBUG with value 1
[+] Enabled environment variable AFL_NO_UI with value 1
afl-fuzz++4.33c based on afl by Michal Zalewski and a large online community
[+] AFL++ is maintained by Marc "van Hauser" Heuse, Dominik Maier, Andrea Fioraldi and Heiko "hexcoder" Eißfeldt
[+] AFL++ is open source, get it at https://github.com/AFLplusplus/AFLplusplus
[+] NOTE: AFL++ >= v3 has changed defaults and behaviours - see README.md
[+] No -M/-S set, autoconfiguring for "-S default"
[*] Getting to work...
[+] Using exploration-based constant power schedule (EXPLORE)
[+] Enabled testcache with 50 MB
[+] Generating fuzz data with a length of min=1 max=1048576
[*] Checking CPU scaling governor...
[!] WARNING: Could not check CPU min frequency
[+] Disabling the UI because AFL_NO_UI is set.
[+] You have 8 CPU cores and 3 runnable tasks (utilization: 38%).
[+] Try parallel jobs - see /opt/homebrew/Cellar/afl++/4.33c_1/share/doc/afl/fuzzing_in_depth.md#c-using-multiple-cores
[*] Setting up output directories...
[+] Output directory exists but deemed OK to reuse.
[*] Deleting old session data...
[+] Output dir cleanup successful.
[*] Validating target binary...
[+] Persistent mode binary detected.
[*] Scanning 'fuzz-inputs'...
[*] Creating hard links for all input files...
[+] Loaded a total of 1 seeds.
[*] Spinning up the fork server...

Troubleshooting Fork Server Issues

With the fork server optimization enabled, you may face unexpected worker process terminations. I investigated the unexpected crashes caused by these terminations in the cmpctblock fuzz harness and documented my findings in this GitHub comment.

To avoid such issues, disable the fork server optimization:

FUZZ=cmpctblock AFL_NO_FORKSRV=1 afl-fuzz -i fuzz-inputs -o fuzz-outputs -- build_fuzz/bin/fuzz

How Bitcoin's 10-Minute Block Interval Ends Up Being 20

stringintech — Sat, 12 Apr 2025 13:24:24 +0000

Earlier, I was reading an interesting mathematical explanation behind the paradox related to the interval between mined Bitcoin blocks. Despite the fact that the interval is 10 minutes on average, if you show up at a random time, you’ll see that the expected time until the next block appears is 10 minutes—regardless of how long you’ve already been waiting. Similarly, on average, you should expect that the previous block was mined 10 minutes ago. With this sampling approach, you end up with 10 minutes before and 10 minutes after, so overall, the interval between two consecutive blocks appears to be 20 minutes on average—not just 10 as you might initially think!

To explore what the article was saying, I had an idea. I thought: What if I generate lots of intervals whose average is 10 minutes, and then string them together to create a timeline? Next, I’d generate a bunch of random points along this timeline, and for each one, check which interval it falls into, calculating the distance from the point to the start of the interval and to the end. By averaging these distances, I should be able to see for myself that, on average, the end of the interval is 10 minutes away, the start of the interval block 10 minutes ago, and the full interval adds up to 20 minutes.

I wrote a little piece of code to do just that—and, unsuprisingly, the results matched my expectations:

Average block interval: 10.0339
Avg time to next block: 10.0594
Avg time since last block: 10.1122
Avg total interval length: 20.1717

#include <iostream>
#include <vector>
#include <random>
#include <cmath>

int main()
{
    const int INTERVAL_COUNT = 100000;
    const int SAMPLE_COUNT = 100000;
    const double AVG_BLOCK_TIME = 10.0;

    // Setup random number generation
    std::random_device rd;
    std::mt19937 gen(rd());
    std::uniform_real_distribution<> uniform_dist(0.0, 1.0);

    // Generate exponentially distributed block intervals
    auto random_interval = [&]()
    {
        return -AVG_BLOCK_TIME * log(1.0 - uniform_dist(gen));
    };

    // Create a timeline of block boundaries
    std::vector<double> block_times{0};
    for (int i = 0; i < INTERVAL_COUNT; i++)
    {
        block_times.push_back(block_times.back() + random_interval());
    }

    // Calculate the actual average interval
    double timeline_end = block_times.back();
    double achieved_mean = timeline_end / INTERVAL_COUNT;
    std::cout << "Average block interval: " << achieved_mean << std::endl;

    // Sample random points on the timeline
    std::uniform_real_distribution<> timeline_dist(0.0, timeline_end);
    double time_to_next = 0.0;
    double time_since_last = 0.0;
    double total_interval = 0.0;

    for (int i = 0; i < SAMPLE_COUNT; i++)
    {
        // Pick a random point in time
        double point = timeline_dist(gen);

        // Find which interval contains this point
        auto it = std::upper_bound(block_times.begin(), block_times.end(), point) - 1;
        int idx = it - block_times.begin();

        // Calculate times to adjacent blocks
        double next_time = block_times[idx + 1] - point;
        double prev_time = point - block_times[idx];
        double interval_length = block_times[idx + 1] - block_times[idx];

        time_to_next += next_time;
        time_since_last += prev_time;
        total_interval += interval_length;
    }

    std::cout << "Avg time to next block: " << time_to_next / SAMPLE_COUNT << std::endl;
    std::cout << "Avg time since last block: " << time_since_last / SAMPLE_COUNT << std::endl;
    std::cout << "Avg total interval length: " << total_interval / SAMPLE_COUNT << std::endl;

    return 0;
}

PostgreSQL Buffer Cache: A Practical Guide

stringintech — Sat, 07 Dec 2024 13:13:49 +0000

Before adding indexes or application-level caching to optimize PostgreSQL performance, it's worth understanding how a relational database like PostgreSQL manages memory. While we'll focus on PostgreSQL's implementation, the concepts discussed here are fundamental to understanding memory management in most relational database systems.

PostgreSQL keeps frequently accessed data in memory, sometimes providing the performance boost we need without additional complexity of introducing more fined-grained caches in application level. Let's try to come up with a basic idea of this mechanism through practical examples to hopefully better inform our optimization decisions.

Understanding the Fundamentals

Before diving into practical examples, let's clarify some key PostgreSQL concepts:

Relation: Any database object that contains rows. Tables, indexes, and sequences are all relations in PostgreSQL.
Page: The basic unit of storage in PostgreSQL (typically 8KB). Each relation is stored as a collection of pages on disk.
Buffer: When a page is loaded into memory, it becomes a buffer. Think of buffers as the in-memory representation of disk pages.
Buffer Cache: A shared memory area where PostgreSQL keeps frequently accessed pages.

Our Toolkit: PostgreSQL Buffer Cache Monitoring with `pg_buffercache` and `pg_class`

To explore buffer cache behavior, we'll use two main PostgreSQL tools:

pg_buffercache extension: Provides real-time visibility into shared buffer cache content, allowing us to track which pages are in memory and their current state. See the PostgreSQL documentation for more details.
pg_class system catalog: Contains metadata about database objects (tables, indexes, etc.)

Let's start by installing the pg_buffercache extension:

CREATE EXTENSION IF NOT EXISTS pg_buffercache;

Practical Exploration

We'll follow these steps to understand buffer cache behavior:

Create a test table with predictable data size
Observe how data is stored in pages
Use pg_buffercache to observe how pages are loaded into memory during queries
Add an index to see how it affects page loading patterns
Track how buffer cache state changes when we modify data
See how system processes handle dirty (modified) pages
Compare query performance for cached vs uncached data

Setting Up Our Test Environment

Let's create a test table without any indexes:

CREATE TABLE buffer_test (
    id      SERIAL,
    data    TEXT
);

-- Insert 100k rows with 1KB data each
INSERT INTO buffer_test (data)
SELECT repeat('x', 1024)
FROM generate_series(1, 100000);

Understanding Table Size

First, let's examine our table's size using pg_class:

SELECT relpages, reltuples 
FROM   pg_class 
WHERE  relname = 'buffer_test';

relpages	reltuples
0	0

The zero values indicate that table statistics haven't been updated. Let's fix that:

ANALYZE buffer_test;

SELECT relpages, reltuples 
FROM   pg_class 
WHERE  relname = 'buffer_test';

relpages	reltuples
14,286	100,000

Here:

relpages: Number of disk pages the table uses
reltuples: Estimated number of rows

Monitoring Cache Behavior

Let's verify our cache is empty:

SELECT    COUNT(*) 
FROM      pg_buffercache b
JOIN      pg_class c 
  ON      b.relfilenode = c.relfilenode
WHERE     c.relname = 'buffer_test';

count
0

The relfilenode column in pg_buffercache helps us identify which buffers belong to our table. Now, let's query a specific row:

SELECT * FROM buffer_test WHERE id = 70000;

-- Check cache after the query
SELECT    COUNT(*) 
FROM      pg_buffercache b
JOIN      pg_class c 
  ON      b.relfilenode = c.relfilenode
WHERE     c.relname = 'buffer_test';

count
32

You may have expected to see only one page loaded into memory since the row we queried earlier belongs to one page. But it's not the case. Since we have not introduced any indexes on the id column yet, database cannot efficiently find that one page and it has to do a sequential scan which causes PostgreSQL to read through all table pages from the beginning until it finds our target row. The number of pages loaded depends on various factors including the database's buffer replacement strategy.

Adding an Index

Let's add an index:

CREATE INDEX ON buffer_test(id);

Now restart PostgreSQL server to clear the cache. As an example, this is how I did it on my installed version on macOS using pg_ctl:

sudo -u postgres pg_ctl restart -D /Library/PostgreSQL/17/data

After restart, query the same row:

SELECT    ctid, id 
FROM      buffer_test 
WHERE     id = 70000;

ctid	id
(9999,7)	70000

The ctid (Tuple ID) is a special system column in PostgreSQL that represents the physical location of a row version within its table. Every row in a PostgreSQL table has a Tuple ID that consists of two numbers: the block number (or page number) and the tuple index within that block. Here ctid shows our row is on page 9999. Let's check the buffer cache:

SELECT    bufferid, 
          relblocknumber, 
          isdirty 
FROM      pg_buffercache b  
JOIN      pg_class c 
  ON      b.relfilenode = c.relfilenode  
WHERE     c.relname = 'buffer_test';

bufferid	relblocknumber	isdirty
149	9999	false

With the index, PostgreSQL loaded only the needed page. Key columns used here:

bufferid: Unique identifier for the buffer in shared memory
relblocknumber: Page number within the relation
isdirty: Indicates if the page has been modified

Observing Dirty Pages

Let's modify our row:

UPDATE    buffer_test 
SET       data = 'modified data' 
WHERE     id = 70000;

SELECT    bufferid, 
          relblocknumber, 
          isdirty
FROM      pg_buffercache b
JOIN      pg_class c 
  ON      b.relfilenode = c.relfilenode
WHERE     c.relname = 'buffer_test';

bufferid	relblocknumber	isdirty
149	9999	true

The page is now marked dirty, indicating pending changes.

Understanding Checkpoints

By calling CHECKPOINT we can force writing dirty pages to disk. By default, PostgreSQL runs automatic checkpoints:

Every checkpoint_timeout seconds (default: 5 minutes)
When WAL or Write-Ahead Logging reaches max_wal_size (default: 1 GB)

Now let's force a checkpoint to write dirty pages to disk:

CHECKPOINT;

-- Check cache state
SELECT    bufferid, 
          relblocknumber, 
          isdirty
FROM      pg_buffercache b
JOIN      pg_class c 
  ON      b.relfilenode = c.relfilenode
WHERE     c.relname = 'buffer_test';

bufferid	relblocknumber	isdirty
149	9999	false

Not dirty anymore but still in cache!

Comparing Cached vs Uncached Access

Let's demonstrate the performance benefit of the buffer cache by comparing access times for cached and uncached data. First we query the page number for other rows around the row with id 70000 that we have been working with so far:

SELECT    ctid, id 
FROM      buffer_test 
WHERE     id between 69997 and 70003;

ctid	id
(9999,4)	69997
(9999,5)	69998
(9999,6)	69999
(9999,8)	70000
(10000,1)	70001
(10000,2)	70002
(10000,3)	70003

Now restart the database server once more to make sure we continue with a clean cache. Then query the id 70000 as we did before to load the page 9999 into buffer cache. Now we are ready to perform our comparison using EXPLAIN ANALYSE:

EXPLAIN ANALYSE
SELECT    ctid, id 
FROM      buffer_test 
WHERE     id = 69997;
-- Execution Time: 0.046 ms

EXPLAIN ANALYSE
SELECT    ctid, id 
FROM      buffer_test 
WHERE     id = 10;
-- Execution Time: 0.604 ms

As you see the execution time for accessing the cached data (row with id 69997 belongs the cached page 9999) is about 13 times smaller than the uncached access!

Conclusion

We explored PostgreSQL's buffer cache through practical examples that demonstrated its basic memory management behavior. By creating a test table and using monitoring tools, we observed how data pages move between disk and memory during different operations. Our experiments showed how queries without indexes lead to sequential scans that load multiple pages into memory, while adding an index allowed PostgreSQL to load only the specific page needed. We also saw how pages get marked as "dirty" when modified and remain in cache even after a checkpoint writes them to disk. Finally, we demonstrated how PostgreSQL's buffer cache optimization works in practice by comparing query times between accessing rows from previously loaded pages versus pages that required fresh disk reads.

Understanding JWT Authentication: Spring Security's Architecture and Go Implementation

stringintech — Sat, 30 Nov 2024 15:40:47 +0000

After setting up JWT stateless authentication (available here), I wanted to understand what happens under Spring Security's abstractions by identifying key components and their interactions. To make this exploration more engaging, I reimplemented a minimal version in Go using the standard HTTP library. By breaking down three core flows - registration, token generation, and protected resource access - and rebuilding them in Go, I set out to map Spring Security's authentication patterns to simpler components.

This post focuses specifically on authentication flows - how the system verifies user identity - rather than authorization. We'll explore the flows with sequence diagrams that trace requests through different components in Spring Security's architecture.

Main Components

The system provides three endpoints:

User Registration: Accepts username and password from new users
Token Generation (Login): Creates a JWT token when users successfully log in with valid credentials
Protected Access: Enables authenticated users to access protected resources using their token. The getAuthenticatedUser endpoint serves as an example, returning profile information for the authenticated token holder

In the following sections, I explain the core components involved in each flow, with a sequence diagram for each.

Registration Flow

A registration request containing username and password passes through the Spring Security filter chain, where minimal processing occurs since the registration endpoint was configured to not require authentication in SecurityConfiguration. The request then moves through Spring's DispatcherServlet, which routes it to the appropriate method in UserController based on the URL pattern. The request reaches UserController's register endpoint, where the user information is stored along with a hashed password.

Token Generation Flow

A login request containing username and password passes through the Spring Security filter chain, where minimal processing occurs as this endpoint is also configured to not require authentication in SecurityConfiguration. The request moves through Spring's DispatcherServlet to UserController's login endpoint, which delegates to AuthenticationManager. Using the configured beans defined in ApplicationConfiguration, AuthenticationManager verifies the provided credentials against stored ones. After successful authentication, the UserController uses JwtService to generate a JWT token containing the user's information and metadata like creation time, which is returned to the client for subsequent authenticated requests.

Protected Resource Access Flow

Successful Authentication Flow (200)

Failed Authentication Flow (401)

When a request containing a JWT token in its Authorization header arrives, it passes through the JwtAuthenticationFilter - a custom defined OncePerRequestFilter - which processes the token using JwtService. If valid, the filter retrieves the user via UserDetailsService configured in ApplicationConfiguration and sets the authentication in SecurityContextHolder. If the token is missing or invalid, the filter allows the request to continue without setting authentication.

Later in the chain, AuthorizationFilter checks if the request is properly authenticated via SecurityContextHolder. When it detects missing authentication, it throws an AccessDeniedException. This exception is caught by ExceptionTranslationFilter, which checks if the user is anonymous and delegates to the configured JwtAuthenticationEntryPoint in SecurityConfiguration to return a 401 Unauthorized response.

If all filters pass, the request reaches Spring's DispatcherServlet which routes it to the getAuthenticatedUser endpoint in UserController. This endpoint retrieves the authenticated user information from SecurityContextHolder that was populated during the filter chain process.

Note: Spring Security employs a rich ecosystem of filters and specialized components to handle various security concerns. To understand the core authentication flow, I only focused on the key players in JWT token validation and user authentication.

Go Implementation: Mapping Components

The Go implementation provides similar functionality through a simplified architecture that maps to key Spring Security components:

FilterChain

Provides a minimal version of Spring Security's filter chain
Processes filters sequentially for each request
Uses a per-request chain instance (VirtualFilterChain) for thread safety

Dispatcher

Maps to Spring's DispatcherServlet
Routes requests to appropriate handlers after security filter processing

Authentication Context

Uses Go's context package to store authentication state per request
Maps to Spring's SecurityContextHolder

JwtFilter

Direct equivalent to Spring's JwtAuthenticationFilter
Extracts and validates JWT tokens
Populates authentication context on successful validation

AuthenticationFilter

Simplified version of Spring's AuthorizationFilter
Solely focusing on authentication verification
Checks authentication context and returns 401 if missing

JwtService

Similar to Spring's JwtService
Handles token generation and validation
Uses same core JWT operations but with simpler configuration

Test Coverage

Both implementations include integration tests (auth_test.go and AuthTest.java) verifying key authentication scenarios:

Registration Flow

Successful user registration with valid credentials
Duplicate username registration attempt

Login Flow

Successful login with valid credentials
Login attempt with non-existent username
Login attempt with incorrect password

Protected Resource Access

Successful access with valid token
Access attempt without auth header
Access attempt with invalid token format
Access attempt with expired token
Access attempt with valid token format but non-existent user

The Java implementation includes detailed comments explaining the flow of each test scenario through Spring Security's filter chain. These same flows are replicated in the Go implementation using equivalent components.

Journey Summary

I looked at Spring Security's JWT auth by breaking it down into flows and test cases. Then I mapped these patterns to Go components. Integration tests showed me how requests flow through Spring Security's filter chain and components. Building simple versions of these patterns helped me understand Spring Security's design. The tests proved both implementations handle authentication the same way. Through analyzing, testing, and rebuilding, I gained a deeper understanding of how Spring Security's authentication works.

Optimizing PostgreSQL Mass Deletions with Table Partitioning

stringintech — Mon, 07 Oct 2024 19:41:46 +0000

In database management, handling large-scale data operations efficiently is critical. One common challenge is executing mass deletions on large tables without dragging down overall performance. This article looks at how PostgreSQL's table partitioning feature can significantly speed up the process and
help maintain smooth database operations.

Checkout more of my work here.

The Challenge of Mass Deletions

Deleting a large number of rows from a PostgreSQL table can be a time-consuming operation. It involves:

Scanning through the table to find the rows to delete
Removing the rows and updating indexes
Vacuuming the table to reclaim space

For tables with millions of rows, this process can lead to long-running transactions and table locks, potentially impacting database responsiveness.

Enter Table Partitioning

Table partitioning is a technique where a large table is divided into smaller, more manageable pieces called partitions. These partitions are separate tables that share the same schema as the parent table.

My Benchmark Setup

To quantify the benefits of partitioning, I set up a benchmark with three scenarios using PostgreSQL in a containerized environment:

Simple Table: A standard, non-partitioned table
Partitioned Table (Row Deletion): A table partitioned by week, deleting rows from the first week
Partitioned Table (Partition Drop): Same as #2, but dropping the entire first week's partition

PostgreSQL Container Specifications

PostgreSQL Version: 16.4
Docker Version: 27.0.3
Resource Limits:
- CPU Limit: 8 CPUs
- Memory Limit: 1 GB

Data Characteristics

Total Records: 4 million
Distribution: Evenly distributed over 4 weeks (1 million per week)
Indexing: Both tables (simple and partitioned) have an index on the time column

Key Findings

Scenario	Deletion Time	Table Size
Simple Table	1.26s	728 MB
Partitioned (Delete Rows)	734ms	908 MB
Partitioned (Drop Partition)	6.43ms	908 MB

Dramatic Speed Improvement: Dropping a partition is 196 times faster than deleting rows from a simple table.
Storage Trade-off: Partitioned tables use about 25% more storage due to additional metadata and per-partition indexes.
Minimal Insertion Impact: Partitioning only slightly increased data population time (by about 2.8%).

Why It Works

Targeted Operations: Partitioning allows the database to work with a subset of the data, reducing the scope of operations.
Metadata Operations: Dropping a partition is primarily a metadata operation, avoiding the need to scan and delete individual rows.
Reduced Lock Contention: Smaller partitions mean fewer locks, allowing for better concurrency.

Implementation Highlights

Here's a simplified example of how to set up a partitioned table in PostgreSQL:

CREATE TABLE records (
    id BIGSERIAL,
    time TIMESTAMPTZ NOT NULL,
    body TEXT
) PARTITION BY RANGE (time);

CREATE TABLE records_week_1 PARTITION OF records
    FOR VALUES FROM ('2023-01-01') TO ('2023-01-08');

-- Create index on the partition
CREATE INDEX idx_records_week_1_time ON records_week_1 (time);

-- To delete a week's worth of data:
ALTER TABLE records DETACH PARTITION records_week_1;
DROP TABLE records_week_1;

Conclusion

For databases dealing with time-series data or any scenario where large-scale deletions are common, implementing table partitioning can lead to significant performance improvements. While there's a small trade-off in storage and insertion speed, the gains in deletion efficiency often far outweigh these costs.

By leveraging partitioning, you can maintain high performance even as your data grows, ensuring your PostgreSQL database remains responsive and efficient.

Link to the full benchmark code and detailed results

DEV Community: stringintech

Fuzzing Bitcoin Core using AFL++ on Apple Silicon

Installation

Configuration

Troubleshooting LLVM Issues

Running the Fuzzer

Troubleshooting Fork Server Issues

How Bitcoin's 10-Minute Block Interval Ends Up Being 20

PostgreSQL Buffer Cache: A Practical Guide

Understanding the Fundamentals

Our Toolkit: PostgreSQL Buffer Cache Monitoring with pg_buffercache and pg_class

Practical Exploration

Setting Up Our Test Environment

Understanding Table Size

Monitoring Cache Behavior

Adding an Index

Observing Dirty Pages

Understanding Checkpoints

Comparing Cached vs Uncached Access

Conclusion

Understanding JWT Authentication: Spring Security's Architecture and Go Implementation

Main Components

Registration Flow

Token Generation Flow

Protected Resource Access Flow

Successful Authentication Flow (200)

Failed Authentication Flow (401)

Go Implementation: Mapping Components

Test Coverage

Journey Summary

Optimizing PostgreSQL Mass Deletions with Table Partitioning

The Challenge of Mass Deletions

Enter Table Partitioning

My Benchmark Setup

PostgreSQL Container Specifications

Data Characteristics

Key Findings

Why It Works

Implementation Highlights

Conclusion

Our Toolkit: PostgreSQL Buffer Cache Monitoring with `pg_buffercache` and `pg_class`