<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Prajwal zore</title>
    <description>The latest articles on DEV Community by Prajwal zore (@prajwal_zore_lm10).</description>
    <link>https://dev.to/prajwal_zore_lm10</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3800069%2Fcb74e890-074c-497e-9fdc-f11560f5e45b.png</url>
      <title>DEV Community: Prajwal zore</title>
      <link>https://dev.to/prajwal_zore_lm10</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/prajwal_zore_lm10"/>
    <language>en</language>
    <item>
      <title>Inside My Custom malloc: Bins, tcache, mmap, and Thread Safety</title>
      <dc:creator>Prajwal zore</dc:creator>
      <pubDate>Sun, 03 May 2026 10:52:51 +0000</pubDate>
      <link>https://dev.to/prajwal_zore_lm10/inside-my-custom-malloc-bins-tcache-mmap-and-thread-safety-566m</link>
      <guid>https://dev.to/prajwal_zore_lm10/inside-my-custom-malloc-bins-tcache-mmap-and-thread-safety-566m</guid>
      <description>&lt;p&gt;Most developers use &lt;code&gt;malloc&lt;/code&gt; without thinking much about what happens underneath.&lt;br&gt;&lt;br&gt;
This project is an attempt to explore that layer by building a memory allocator from scratch in C.&lt;/p&gt;

&lt;p&gt;The allocator implements &lt;code&gt;malloc&lt;/code&gt;, &lt;code&gt;free&lt;/code&gt;, &lt;code&gt;calloc&lt;/code&gt;, and &lt;code&gt;realloc&lt;/code&gt; without relying on libc’s heap functions. It focuses on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Thread safety&lt;/li&gt;
&lt;li&gt;Per-thread caching (tcache)&lt;/li&gt;
&lt;li&gt;Efficient free block management using bins&lt;/li&gt;
&lt;li&gt;mmap-based memory growth&lt;/li&gt;
&lt;li&gt;Handling large allocations separately&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This article breaks down the design, implementation decisions, performance characteristics, and limitations of the allocator.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is a Memory Allocator?
&lt;/h2&gt;

&lt;p&gt;A memory allocator is responsible for managing dynamic memory at runtime.&lt;br&gt;&lt;br&gt;
Functions like &lt;code&gt;malloc&lt;/code&gt;, &lt;code&gt;free&lt;/code&gt;, &lt;code&gt;calloc&lt;/code&gt;, and &lt;code&gt;realloc&lt;/code&gt; are part of this layer.&lt;/p&gt;

&lt;p&gt;At a high level, an allocator:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Requests memory from the operating system (e.g., using &lt;code&gt;mmap&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Splits that memory into smaller blocks&lt;/li&gt;
&lt;li&gt;Tracks which blocks are free or in use&lt;/li&gt;
&lt;li&gt;Reuses freed blocks to avoid unnecessary system calls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This layer sits between user programs and the OS, making memory allocation efficient and reusable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Allocators Are Non-Trivial
&lt;/h2&gt;

&lt;p&gt;A good allocator must balance multiple competing goals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Performance&lt;/strong&gt; → allocations should be fast
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory efficiency&lt;/strong&gt; → minimize fragmentation
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability&lt;/strong&gt; → handle multi-threaded workloads
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;System overhead&lt;/strong&gt; → reduce expensive syscalls
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Modern allocators like those in libc (e.g., ptmalloc) are highly optimized and use techniques such as arenas, bins, and thread-local caching.&lt;/p&gt;

&lt;p&gt;This project implements a simplified version of those ideas to understand how they work in practice.&lt;/p&gt;




&lt;h2&gt;
  
  
  Allocator Overview
&lt;/h2&gt;

&lt;p&gt;At a high level, the allocator follows two distinct paths based on allocation size:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Small allocations (&amp;lt; 128KB)&lt;/strong&gt; → handled through heap, bins, and per-thread cache&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Large allocations (≥ 128KB)&lt;/strong&gt; → handled using &lt;code&gt;mmap&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Allocation Flow
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://mermaid.live/edit#pako:eNpVUslu2zAQ_ZXBnBVDm6MFRYvajhs3TQ5NL63kA0ONLcEiqVIUWsfyv5eWHETliTNvGT5iTshVQZjirlZ_eMm0gR-rXII9nzNxFKyuFd_Czc1HWJza6pXgA3h-_LD4dM7lyFtc0P4ntT0ss2VJ_ACGM17SdsSXA35fmR5W2XcynZa2IxVY58N_nMeqtSZ32TcLQEmsuaJ3w_x19kxM8xJeKtlekfWgW6tOFj18yZ6bujLwMjEeCU_KwG4k3dsn_O6otQ2tBAjBGuBlJy-CaZ4n1cPmGqdmek8wzbR5z_T1LdN07GaS5yEbhrTHltvf3KKDe10VmBrdkYOCtGCXEk8XaY6mJEE5pvZaMH3IMZdnq2mY_KWUeJNp1e1LTHesbm3VNQUztKrYXrN3CsmC9NLGNpgG_nzwwPSEfzH1wmAWuXHgx34YuZEbOHjENA5nSRJ6gZ8k81t3HkZnB1-Hoe7sNvJiL5hbsh8mSRI5SEVllH4c12fYovM_HZG2xQ" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2Fpako%3AeNpVUslu2zAQ_ZXBnBVDm6MFRYvajhs3TQ5NL63kA0ONLcEiqVIUWsfyv5eWHETliTNvGT5iTshVQZjirlZ_eMm0gR-rXII9nzNxFKyuFd_Czc1HWJza6pXgA3h-_LD4dM7lyFtc0P4ntT0ss2VJ_ACGM17SdsSXA35fmR5W2XcynZa2IxVY58N_nMeqtSZ32TcLQEmsuaJ3w_x19kxM8xJeKtlekfWgW6tOFj18yZ6bujLwMjEeCU_KwG4k3dsn_O6otQ2tBAjBGuBlJy-CaZ4n1cPmGqdmek8wzbR5z_T1LdN07GaS5yEbhrTHltvf3KKDe10VmBrdkYOCtGCXEk8XaY6mJEE5pvZaMH3IMZdnq2mY_KWUeJNp1e1LTHesbm3VNQUztKrYXrN3CsmC9NLGNpgG_nzwwPSEfzH1wmAWuXHgx34YuZEbOHjENA5nSRJ6gZ8k81t3HkZnB1-Hoe7sNvJiL5hbsh8mSRI5SEVllH4c12fYovM_HZG2xQ%3Ftype%3Dpng" alt="flow chart" width="780" height="766"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;mymalloc(size)&lt;/code&gt; is called&lt;/li&gt;
&lt;li&gt;Size is aligned to 8 bytes&lt;/li&gt;
&lt;li&gt;If size &amp;lt; 128KB:

&lt;ul&gt;
&lt;li&gt;Check per-thread tcache&lt;/li&gt;
&lt;li&gt;If hit → return immediately (no locking)&lt;/li&gt;
&lt;li&gt;If miss → acquire global heap lock

&lt;ul&gt;
&lt;li&gt;Search free bins&lt;/li&gt;
&lt;li&gt;If not found → request space via mmap chunk&lt;/li&gt;
&lt;li&gt;Split block if necessary&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;If size ≥ 128KB:

&lt;ul&gt;
&lt;li&gt;Try large block cache&lt;/li&gt;
&lt;li&gt;Otherwise call &lt;code&gt;mmap&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h3&gt;
  
  
  Free Flow
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;If block belongs to heap:

&lt;ul&gt;
&lt;li&gt;Push to thread-local tcache&lt;/li&gt;
&lt;li&gt;If tcache is full → flush to global bins + coalesce&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;If block is mmap’d:

&lt;ul&gt;
&lt;li&gt;Store in large cache or release via &lt;code&gt;munmap&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h3&gt;
  
  
  Each allocation is preceded by a metadata header:
&lt;/h3&gt;

&lt;p&gt;[ block_header | user_data ]&lt;/p&gt;

&lt;p&gt;The header stores:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;size&lt;/li&gt;
&lt;li&gt;allocation state&lt;/li&gt;
&lt;li&gt;mmap flag&lt;/li&gt;
&lt;li&gt;pointers for heap and bin lists
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌──────────────────────────────┐
│ block_header_t               │  48 bytes
│  size_t size                 │
│  int isfree                  │
│  int ismmapped               │
│  int in_tcache               │
│  block_header_t *next/prev   │  heap linked list
│  block_header_t *bin_next    │  bin free list
│  block_header_t *bin_prev    │  bin free list
├──────────────────────────────┤
│ user data                    │  size bytes  ← returned pointer
└──────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Free blocks are organized into 8 bins based on size ranges which allows:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Faster lookup (O(1) class selection)&lt;/li&gt;
&lt;li&gt;Reduced search overhead
&lt;a href="https://mermaid.live/edit#pako:eNpdUltvmzAU_itHR-pbEgG5EJDWaQ1NL2unPexlhWhy4ATSGoyMWddB_vuw3SbR_GJ8znc5n3GHqcgIQ9xx8ZoWTCr4ESUVDOtLXL7tJNEGxuNLuOpuidUgJJTlsG-5SF8-H5LKYi8uwLRrpgpbudKs3hQNtodV_L1tClACapJjVUhiGaiUpQVtLGdlnKLOFiEVbaXg8hO43lJbaUhkZH9S08N1vOZG8Fzi2kis40fxm6xxox1zLraMw3ZfNe_AtQHexCvBODUpAcueWUqDoWVtzv2-iR5u40hU2uaY2FzE_4lPt9PD3TExZzKnX-eD3hn_--5Bd8AG3rWcfwS9PwX9GpdtdZTdnPf1YA92MBxhLvcZhkq2NMKSZMn0ETtNSFAVVFKC4fCZMfmSYFIdBk7Nqichyg-aFG1eYLhjvBlObZ0xRdGe5ZKdIFRlJFf652Do-a7RwLDDPxi63nziBcE08HzHcYP5dOi-YTj1J643C3zfdd2lu5g5i8MI_xpbZ7L0586wpsHCmfnefDlCyvZKyEf7MM37PPwD8cHRqg" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2Fpako%3AeNpdUltvmzAU_itHR-pbEgG5EJDWaQ1NL2unPexlhWhy4ATSGoyMWddB_vuw3SbR_GJ8znc5n3GHqcgIQ9xx8ZoWTCr4ESUVDOtLXL7tJNEGxuNLuOpuidUgJJTlsG-5SF8-H5LKYi8uwLRrpgpbudKs3hQNtodV_L1tClACapJjVUhiGaiUpQVtLGdlnKLOFiEVbaXg8hO43lJbaUhkZH9S08N1vOZG8Fzi2kis40fxm6xxox1zLraMw3ZfNe_AtQHexCvBODUpAcueWUqDoWVtzv2-iR5u40hU2uaY2FzE_4lPt9PD3TExZzKnX-eD3hn_--5Bd8AG3rWcfwS9PwX9GpdtdZTdnPf1YA92MBxhLvcZhkq2NMKSZMn0ETtNSFAVVFKC4fCZMfmSYFIdBk7Nqichyg-aFG1eYLhjvBlObZ0xRdGe5ZKdIFRlJFf652Do-a7RwLDDPxi63nziBcE08HzHcYP5dOi-YTj1J643C3zfdd2lu5g5i8MI_xpbZ7L0586wpsHCmfnefDlCyvZKyEf7MM37PPwD8cHRqg%3Ftype%3Dpng" alt="free chart" width="726" height="1052"&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Each thread maintains its own cache of free blocks.
&lt;/h3&gt;

&lt;p&gt;which means every thread now has it's own temporary storage to keep the free blocks in it and access them as per need easily.&lt;/p&gt;

&lt;p&gt;Benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No locking on fast path&lt;/li&gt;
&lt;li&gt;High cache locality&lt;/li&gt;
&lt;li&gt;Significant performance boost in multi-threaded workloads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://mermaid.live/edit#pako:eNqFkjtvwjAQgP-KdWsDxHYgj6ESSgUdyoKYmnQwiSERSZzajvoA_ntNAipQVG6yfffdJ59uC4lIOQSwKsRHkjGp0cs8rpAJ1SzXktUZWuBokUnOUoTfutQhxjgaF4VIBhPJOZrz94Yrrc4KQhzphCUZRzdp1Os9mpruiVdpXF1ryUlLzkFyT0uutJd0pyX_aOlJS89Bek9Lr7SXdKelf7RhO4fdLFcKDdCkaFS2Q9NoWoglK9AzZzVa5pVCD8jIN8eWIbkFHXP0Vg4sWMs8hUDLhltQclmywxW2ByoGnfGSxxCYY8rkJoa42humZtWrEOUJk6JZZxCsWKHMralTpvlTzszcfkvM37gMRVNpCCimbQ8ItvAJAXZo37U9SjziuLZrm-QXBJ7T930HU-L7w5E9dNy9Bd-t1O6PXOxhOqQ2cR3i2CMLeJprIWfdzraru_8BrTzOAA" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2Fpako%3AeNqFkjtvwjAQgP-KdWsDxHYgj6ESSgUdyoKYmnQwiSERSZzajvoA_ntNAipQVG6yfffdJ59uC4lIOQSwKsRHkjGp0cs8rpAJ1SzXktUZWuBokUnOUoTfutQhxjgaF4VIBhPJOZrz94Yrrc4KQhzphCUZRzdp1Os9mpruiVdpXF1ryUlLzkFyT0uutJd0pyX_aOlJS89Bek9Lr7SXdKelf7RhO4fdLFcKDdCkaFS2Q9NoWoglK9AzZzVa5pVCD8jIN8eWIbkFHXP0Vg4sWMs8hUDLhltQclmywxW2ByoGnfGSxxCYY8rkJoa42humZtWrEOUJk6JZZxCsWKHMralTpvlTzszcfkvM37gMRVNpCCimbQ8ItvAJAXZo37U9SjziuLZrm-QXBJ7T930HU-L7w5E9dNy9Bd-t1O6PXOxhOqQ2cR3i2CMLeJprIWfdzraru_8BrTzOAA%3Ftype%3Dpng" alt="tcache" width="860" height="428"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Free Block Bins
&lt;/h3&gt;

&lt;p&gt;Free blocks are organized into 8 size-based bins.&lt;/p&gt;

&lt;p&gt;This enables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;O(1) size class lookup
&lt;/li&gt;
&lt;li&gt;Reduced search overhead
&lt;/li&gt;
&lt;li&gt;Better reuse of memory blocks
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Benchmarks
&lt;/h2&gt;

&lt;p&gt;Tests were run on x86-64 Linux with 8 threads and &lt;code&gt;-O2&lt;/code&gt;.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Test&lt;/th&gt;
&lt;th&gt;Custom&lt;/th&gt;
&lt;th&gt;libc&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Single alloc/free(1000k)&lt;/td&gt;
&lt;td&gt;58ms&lt;/td&gt;
&lt;td&gt;29ms&lt;/td&gt;
&lt;td&gt;2x slower&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Batch alloc(10k)&lt;/td&gt;
&lt;td&gt;1.44ms&lt;/td&gt;
&lt;td&gt;3.59ms&lt;/td&gt;
&lt;td&gt;2.5x faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Batch free(10k)&lt;/td&gt;
&lt;td&gt;0.36ms&lt;/td&gt;
&lt;td&gt;1.54ms&lt;/td&gt;
&lt;td&gt;4x faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mixed sizes(100k)&lt;/td&gt;
&lt;td&gt;6.46ms&lt;/td&gt;
&lt;td&gt;2.95ms&lt;/td&gt;
&lt;td&gt;2x slower&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Realloc chain(100k)&lt;/td&gt;
&lt;td&gt;6.42ms&lt;/td&gt;
&lt;td&gt;2.56ms&lt;/td&gt;
&lt;td&gt;2.5x slower&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multithreaded(8 threads-5k each)&lt;/td&gt;
&lt;td&gt;64ms&lt;/td&gt;
&lt;td&gt;67ms&lt;/td&gt;
&lt;td&gt;Comparable&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Looking at the performance stats, it becomes clear why libc’s allocator is so highly optimized—it’s the result of decades of engineering and refinement.&lt;br&gt;
My alloctor has produced average results but While my allocator doesn’t match its performance, building it gave me a much deeper understanding of how memory allocators work, and what makes production-grade implementations fundamentally different.&lt;/p&gt;




&lt;h3&gt;
  
  
  Observations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Batch workloads benefit heavily from tcache&lt;/li&gt;
&lt;li&gt;Large allocation cache reduces mmap calls significantly&lt;/li&gt;
&lt;li&gt;Global lock limits scalability&lt;/li&gt;
&lt;li&gt;libc remains more optimized for general workloads&lt;/li&gt;
&lt;li&gt;Next, I plan to implement per-thread arenas to eliminate global lock contention.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Limitations
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Single global lock limits scalability
&lt;/li&gt;
&lt;li&gt;No in-place realloc
&lt;/li&gt;
&lt;li&gt;No coalescing inside tcache
&lt;/li&gt;
&lt;li&gt;Large mmap blocks may waste memory
&lt;/li&gt;
&lt;li&gt;No cleanup on thread exit
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Source Code
&lt;/h2&gt;

&lt;p&gt;&lt;a href="//github.com/Whitfrost21/bump_allocator_c"&gt;You can explore the full source here.&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Building a memory allocator from scratch highlights the trade-offs between performance, complexity, and correctness.&lt;/p&gt;

&lt;p&gt;Even a simplified allocator quickly grows in complexity when thread safety, caching, and fragmentation are considered.&lt;/p&gt;

&lt;p&gt;If you have suggestions, optimizations, or questions, feel free to ask or start a discussion.&lt;/p&gt;

</description>
      <category>c</category>
      <category>linux</category>
      <category>architecture</category>
      <category>performance</category>
    </item>
    <item>
      <title>I Built malloc() from Scratch in C — Here’s What Went Wrong</title>
      <dc:creator>Prajwal zore</dc:creator>
      <pubDate>Sun, 26 Apr 2026 17:32:19 +0000</pubDate>
      <link>https://dev.to/prajwal_zore_lm10/i-built-malloc-from-scratch-in-c-heres-what-went-wrong-5f60</link>
      <guid>https://dev.to/prajwal_zore_lm10/i-built-malloc-from-scratch-in-c-heres-what-went-wrong-5f60</guid>
      <description>&lt;p&gt;Most of us use &lt;code&gt;malloc()&lt;/code&gt; without thinking about what happens underneath.&lt;/p&gt;

&lt;p&gt;I decided to implement my own memory allocator in C to understand it better. This wasn’t for production use, just to learn how allocation, fragmentation, and concurrency actually behave in practice.&lt;/p&gt;

&lt;p&gt;I also benchmarked it against glibc’s &lt;code&gt;malloc&lt;/code&gt; to see where it stands.&lt;/p&gt;




&lt;h2&gt;
  
  
  Implementation Overview
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://dev.to/prajwal_zore_lm10/inside-my-custom-malloc-bins-tcache-mmap-and-thread-safety-566m"&gt;(checkout entire implementation here)&lt;/a&gt;&lt;br&gt;
My allocator currently includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Thread-local cache&lt;/li&gt;
&lt;li&gt;Free lists (bins) for different size ranges&lt;/li&gt;
&lt;li&gt;Direct &lt;code&gt;mmap&lt;/code&gt; for larger allocations&lt;/li&gt;
&lt;li&gt;A custom &lt;code&gt;realloc()&lt;/code&gt; implementation&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Benchmark Results
&lt;/h2&gt;

&lt;h3&gt;
  
  
  glibc malloc
&lt;/h3&gt;

&lt;p&gt;Single-threaded:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;alloc + free (1M iterations): ~26 ms&lt;/li&gt;
&lt;li&gt;batch alloc/free:3.40ms/0.95ms&lt;/li&gt;
&lt;li&gt;mixed sizes: ~2.5 ms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Multi-threaded:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;8 threads: ~57 ms&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  My Allocator
&lt;/h3&gt;

&lt;p&gt;Single-threaded:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;alloc + free (1M iterations): ~83 ms&lt;/li&gt;
&lt;li&gt;batch alloc/free:1.46ms/0.50ms (faster than glibc)&lt;/li&gt;
&lt;li&gt;mixed sizes: ~126 ms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Multi-threaded:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;8 threads: ~791 ms&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What Worked
&lt;/h2&gt;

&lt;p&gt;Batch allocation and free operations were faster than glibc.&lt;/p&gt;

&lt;p&gt;This likely comes from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;simpler logic in the fast path&lt;/li&gt;
&lt;li&gt;low per-operation overhead&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So in very controlled scenarios, a simple allocator can outperform a general-purpose one.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where It Struggled
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Mixed Allocation Sizes
&lt;/h3&gt;

&lt;p&gt;Performance dropped heavily when handling mixed sizes.&lt;/p&gt;

&lt;p&gt;The main issue was my bin design:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;limited number of bins&lt;/li&gt;
&lt;li&gt;coarse grouping of sizes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This leads to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;poor fit for requested sizes&lt;/li&gt;
&lt;li&gt;more fragmentation&lt;/li&gt;
&lt;li&gt;additional overhead during allocation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;glibc avoids this with more refined size classes.&lt;/p&gt;




&lt;h3&gt;
  
  
  Multithreading
&lt;/h3&gt;

&lt;p&gt;This was the biggest weakness.&lt;/p&gt;

&lt;p&gt;Even with thread-local caches, I ran into issues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;shared access to heap structures&lt;/li&gt;
&lt;li&gt;contention when falling back to global data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I tried:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;global locks&lt;/li&gt;
&lt;li&gt;per-bin locks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both increased complexity, and debugging became harder.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;code&gt;realloc()&lt;/code&gt; Bug
&lt;/h3&gt;

&lt;p&gt;The most difficult issue I faced was in &lt;code&gt;realloc()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;I initially made a mistake:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;allocating a new block using the old size&lt;/li&gt;
&lt;li&gt;instead of handling cases where the new size is smaller&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This caused:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;memory corruption&lt;/li&gt;
&lt;li&gt;segmentation faults later in execution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The correct behavior:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;if &lt;code&gt;new_size &amp;lt;= old_size&lt;/code&gt;, shrink in place&lt;/li&gt;
&lt;li&gt;only allocate a new block when expansion is required&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fixing this resolved the crashes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Debugging Experience
&lt;/h2&gt;

&lt;p&gt;At one point, I removed locking entirely because debugging became too difficult.&lt;/p&gt;

&lt;p&gt;The issue turned out not to be concurrency, but incorrect logic in &lt;code&gt;realloc()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Using &lt;code&gt;gdb&lt;/code&gt; helped identify the exact failure point.&lt;/p&gt;

&lt;p&gt;One key takeaway:&lt;/p&gt;

&lt;p&gt;Allocator bugs often don’t crash immediately.&lt;br&gt;
They corrupt memory and fail later, which makes debugging harder.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Simple designs can perform well in specific cases, but don’t scale&lt;/li&gt;
&lt;li&gt;Handling mixed allocation sizes efficiently requires better size class design&lt;/li&gt;
&lt;li&gt;Thread-local caching helps, but doesn’t eliminate shared state&lt;/li&gt;
&lt;li&gt;Concurrency adds complexity, especially when debugging&lt;/li&gt;
&lt;li&gt;Tools like &lt;code&gt;gdb&lt;/code&gt; are essential for low-level debugging&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;p&gt;If I continue working on this allocator, I plan to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;improve size class handling&lt;/li&gt;
&lt;li&gt;introduce per-thread arenas&lt;/li&gt;
&lt;li&gt;reduce contention in shared structures&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;This project gave me a much clearer understanding of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;how allocators manage memory&lt;/li&gt;
&lt;li&gt;why fragmentation and contention matter&lt;/li&gt;
&lt;li&gt;why production allocators are complex&lt;/li&gt;
&lt;li&gt;why thread safety is important &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s one thing to read about memory allocation, and another to implement it and deal with its edge cases.&lt;/p&gt;

&lt;p&gt;If you're interested in systems programming, building a memory allocator is a worthwhile exercise.&lt;/p&gt;

</description>
      <category>c</category>
      <category>systems</category>
      <category>linux</category>
      <category>performance</category>
    </item>
    <item>
      <title>I Built a Simple Log Aggregation and Analytics tool</title>
      <dc:creator>Prajwal zore</dc:creator>
      <pubDate>Wed, 04 Mar 2026 08:40:43 +0000</pubDate>
      <link>https://dev.to/prajwal_zore_lm10/i-built-a-simple-log-aggregation-and-analytics-tool-2idp</link>
      <guid>https://dev.to/prajwal_zore_lm10/i-built-a-simple-log-aggregation-and-analytics-tool-2idp</guid>
      <description>&lt;h2&gt;
  
  
  I Built StackLens — A Simple Log Aggregation Dashboard
&lt;/h2&gt;

&lt;p&gt;Logs are one of the first things developers check when something goes wrong. But when logs come from multiple services, they quickly become hard to manage.&lt;/p&gt;

&lt;p&gt;To explore how centralized logging works, I built &lt;strong&gt;StackLens&lt;/strong&gt; — a full-stack log aggregation dashboard.&lt;/p&gt;

&lt;p&gt;The idea is simple: services send logs to a backend API, the logs are stored in PostgreSQL, and a web dashboard lets you search, filter, and analyze them.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;I started with PERN stack , i don't know much about technologies yet ,still exploring though.... but according to you what stack is best suitable for this that if we think about scaling this project ? i would love to hear your opinion.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Frontend&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;React&lt;/li&gt;
&lt;li&gt;TypeScript&lt;/li&gt;
&lt;li&gt;TailwindCSS&lt;/li&gt;
&lt;li&gt;TanStack Query&lt;/li&gt;
&lt;li&gt;React Router&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Backend&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Node.js&lt;/li&gt;
&lt;li&gt;Express&lt;/li&gt;
&lt;li&gt;PostgreSQL&lt;/li&gt;
&lt;li&gt;Supabase (hosted Postgres)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What schema i used ?
&lt;/h2&gt;

&lt;p&gt;Each log entry looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"service"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"auth-service"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"level"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"info"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"User authenticated successfully"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"metadata"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"ip"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"192.168.1.10"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-03-02T11:39:16Z"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Why I Built This
&lt;/h2&gt;

&lt;p&gt;The goal of this project was to practice building a real-world style dashboard that combines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;backend APIs&lt;/li&gt;
&lt;li&gt;database design&lt;/li&gt;
&lt;li&gt;frontend data fetching&lt;/li&gt;
&lt;li&gt;responsive UI design&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even though it's a simplified system, it helped me understand how log monitoring tools work behind the scenes.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;Building StackLens helped me practice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;designing filtering APIs&lt;/li&gt;
&lt;li&gt;using PostgreSQL enums and JSONB&lt;/li&gt;
&lt;li&gt;managing server state with TanStack Query&lt;/li&gt;
&lt;li&gt;building responsive dashboard layouts&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;If you're interested, you can check out the project here:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; (&lt;a href="https://github.com/Whitfrost21/StackLens" rel="noopener noreferrer"&gt;https://github.com/Whitfrost21/StackLens&lt;/a&gt;)&lt;/p&gt;




&lt;p&gt;Thanks for reading!, love to hear if you build something similar.&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>monitoring</category>
      <category>postgres</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Chatgpt is heavier or is that a bug ?</title>
      <dc:creator>Prajwal zore</dc:creator>
      <pubDate>Sun, 01 Mar 2026 15:28:13 +0000</pubDate>
      <link>https://dev.to/prajwal_zore_lm10/chatgpt-is-heavier-or-what--47mm</link>
      <guid>https://dev.to/prajwal_zore_lm10/chatgpt-is-heavier-or-what--47mm</guid>
      <description>&lt;p&gt;hello devs,&lt;br&gt;
i was recently using chatgpt for a while and i got a surprise that when i try to open and start chats old chats which have a lot of conversations in it of course and suddenly my entire RAM was on the fire that my laptop started to freeze down. may be this is because chatgpt's DOM ,but they must optimize.&lt;br&gt;
Also switching chats is good option but it still misses topic and goes out of context, better if they provide a summary option which help keeping track of context. What do you think ? &lt;/p&gt;

</description>
      <category>chatgpt</category>
      <category>webdev</category>
      <category>discuss</category>
      <category>todayilearned</category>
    </item>
  </channel>
</rss>
