Hello, I'm Maneshwar. I'm building git-lrc, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on Github. Star Us to help devs discover the project. Do give it a try and share your feedback for improving the product.
When a database needs to search, insert, or update records quickly, it relies on index structures.The B-Tree has been the standard index used by storage engines such as InnoDB (MySQL), SQLite, PostgreSQL (for many workloads), and countless embedded systems.
How B-Trees Work
A B-Tree organizes records in sorted order and splits data into pages, which are fixed-size blocks of storage.
A page is the smallest unit of reading and writing between memory and disk.
Typical page sizes: 4 KB or 8 KB (aligned with SSD block size)
When you query a B-Tree:
- The database reads one page from disk at a time.
- Each page contains many records.
- Following pointers from page to page narrows the search until the exact record is found.
This design efficiently minimizes disk IO, which historically has been the slowest part of database access.
Why Pages Matter
Even if only a small portion of data is accessed, the database must fetch or rewrite the entire page.
Example:
- Page size: 4 KB
- One record change: 50 bytes
- Disk must rewrite: 4 KB
This leads to a performance issue known as write amplification.
Problem #1 — High Write Amplification
Because B-Trees update data in place, modifying a small value forces the system to rewrite the full page.
Effects:
- Disk bandwidth is wasted
- SSD wear increases
- Throughput drops under heavy write loads
This becomes painful in workloads like:
- Logging
- High-frequency inserts and updates
- Secondary indexes with very small keys and values
Problem #2 — Inefficient Caching
B-Trees cache data in memory at page granularity.
If only 5 records in a page are frequently accessed (hot), the system still caches the entire page:
- Memory is wasted holding cold data
- Cache hit rate suffers
- Hot records compete with irrelevant ones
For large databases where RAM is the real performance bottleneck, page-level caching becomes inefficient.
Problem #3 — Secondary Index Pain
In a secondary index, keys and values are tiny.
But because the B-Tree still works page-by-page:
- Writes rewrite many more bytes than needed
- Cache fills with mostly unused data
- Reads and writes both struggle as scale grows
Existing Attempts to Fix B-Trees
Many systems tried to reduce write amplification:
LSM Trees (used in RocksDB, LevelDB, Cassandra)
They avoid in-place updates by appending to logs.
- Pros: much better write throughput
- Cons: slower reads, expensive compaction, degraded scan performance
Bw-Tree & Bε-Tree
Use delta records instead of rewriting whole pages.
- Pros: fewer full page rewrites
- Cons: reads follow long chains; scans become slow
Record caching systems (Anti-Caching, Siberia, Tree-line, TwoTree)
Place hot records in a separate in-memory structure.
- Pros: improves point lookups
- Cons: poor scan performance, complicated tuning, fragmented design
Every improvement solves one problem but hurts another.
What’s Next
A new approach called Bf-Tree proposes separating in-memory cache pages from on-disk pages and using smaller variable-size “mini-pages” to avoid rewriting full pages and avoid caching irrelevant data.
Early results show significant performance gains in both reads and writes.
But that’s a deep topic.
Stay tuned
In the next article, we’ll dive into:
- How Bf-Trees work internally
- What mini-pages are
- Why they provide better caching & lower write amplification
- Performance comparisons against B-Trees and LSM Trees
*AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*
Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.
⭐ Star it on GitHub:
HexmosTech
/
git-lrc
Free, Unlimited AI Code Reviews That Run on Commit
| 🇩🇰 Dansk | 🇪🇸 Español | 🇮🇷 Farsi | 🇫🇮 Suomi | 🇯🇵 日本語 | 🇳🇴 Norsk | 🇵🇹 Português | 🇷🇺 Русский | 🇦🇱 Shqip | 🇨🇳 中文 |
git-lrc
Free, Unlimited AI Code Reviews That Run on Commit
AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.
See It In Action
See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements
git-lrc-intro-60s.mp4
Why
- 🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
- 🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
- 🔁 Build a…
Top comments (0)