<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Erich</title>
    <description>The latest articles on DEV Community by Erich (@h0tb0x).</description>
    <link>https://dev.to/h0tb0x</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3495304%2F6d8e9993-312e-44bd-a07b-0ace2c6ad47e.JPG</url>
      <title>DEV Community: Erich</title>
      <link>https://dev.to/h0tb0x</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/h0tb0x"/>
    <language>en</language>
    <item>
      <title>Notes on not getting hired</title>
      <dc:creator>Erich</dc:creator>
      <pubDate>Thu, 12 Mar 2026 19:02:24 +0000</pubDate>
      <link>https://dev.to/h0tb0x/notes-on-not-getting-hired-1ph1</link>
      <guid>https://dev.to/h0tb0x/notes-on-not-getting-hired-1ph1</guid>
      <description>&lt;p&gt;On a whim, I applied to a defense tech company. Their recruiter emailed me three hours later. We had a phone screen the next day. A coding interview the week after. A second coding interview the week after that. Just like that I was in a final loop. One application, no networking, no LinkedIn DMs, no referral.&lt;/p&gt;

&lt;p&gt;I nailed the coding portion. Had a genuinely good conversation with the hiring manager. Then came the system design round.&lt;/p&gt;

&lt;p&gt;I had never done a system design interview before.&lt;/p&gt;

&lt;p&gt;I walked through an architecture and started second-guessing myself out loud. It's exactly as bad as it sounds. You can feel the moment an interview turns. It's like a key that doesn't quite catch. You keep turning it, hoping it'll grab, and it never does. I had no idea what I was doing, and worse, I was demonstrating that fact in real time to several strangers.&lt;/p&gt;

&lt;p&gt;The recruiter called with a rejection two days later. At least it wasn't an automated email.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;1 application. 1 final loop. 0 offers.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;75 applications went out over the next six weeks. August into September. I tracked everything in a spreadsheet. Which company, what role, what stage of the process. The spreadsheet was meticulous. The responses were not. The silence was nearly total. My inbox was empty.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;76 applications. 1 final round. 0 offers.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;January. 85 more applications. I spent the fall building. A search engine in C++, a prediction market arbitrage system in Python, a database in Rust. Things I could point to and say, "I built this, here's how it works, here's why it's cool."&lt;/p&gt;

&lt;p&gt;Still nothing in my inbox. The market did not care. The market does not care.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;161 applications. 1 final round. 0 offers.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;The average job posting attracts around 250 applications. At a recognizable tech company the number is higher. At a large one that number is astronomical.¹ 75% of those resumes are rejected by ATS software before a human ever sees them. Your resume doesn't go into a pile. It goes into a filter.² Wrong keywords, wrong format, wrong anything and you're out before a person weighs in. Run 161 applications through the funnel: roughly 40 reach a human. Roughly 33% of those make it to interview scheduling, which leaves 13. About 32% of those pass the intermediate screening, so 4 should reach a final loop.&lt;/p&gt;

&lt;p&gt;When a resume does reach a recruiter, the initial scan takes around 7 seconds. The average recruiter today manages 2,500+ applications across all their open roles. Screening 500 applications at even 30 seconds each is 4 hours of pure triage before any meaningful conversation happens. 40 applications reviewed for 7 seconds each is 4 minutes and 40 seconds of human attention.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;161 applications. 4 minutes and 40 seconds.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;A final loop at a large tech company runs 4 to 6 interviews. That is after a recruiter screen and one or two technical screens. Then the loop itself contains coding, system design, and behavioral rounds. Six to nine interviews per company before you get a decision. The onsite-to-offer ratio runs about 3:1, so those 4 final loops should produce roughly one offer. I had one final loop. The math says I should have three more loops and an offer. &lt;/p&gt;

&lt;p&gt;In engineering specifically, the average number of interviews per hire is the highest in tech, meaning the conversion rate from interview to offer is the lowest of any sector. A candidate today is three times less likely to get hired for a role than they were three years ago. On average, it takes 20 total interviews across multiple applications to land one offer.&lt;/p&gt;

&lt;p&gt;The alternative is a referral. The funnel above assumes a cold application. With a referral, your resume skips the filter entirely and goes into the hands of a real person. Industry data puts referred candidates at roughly a 30% hire rate compared to under 3% for cold applications.³ A warm introduction is doing more work than anything in your portfolio.&lt;/p&gt;




&lt;p&gt;Then two things happened in the same month.&lt;/p&gt;

&lt;p&gt;Someone referred me somewhere. One phone screen, then a full loop.&lt;/p&gt;

&lt;p&gt;One of the 85 applications came back to life. No phone screen, no recruiter call. Just an email to schedule a virtual coding interview with a real person. I passed it. Then a full loop.&lt;/p&gt;

&lt;p&gt;Two companies whose rejection emails I'd actually be sad to receive. One referral, one cold application, both arriving at the same destination in the same month.&lt;/p&gt;

&lt;p&gt;Both loops went well enough that I can't tell which way they'll land. That's a strange thing to say after months of inbox silence. The last time I left a final loop I knew exactly how it went. Uncertainty feels better than dread.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;161 applications. 3 final rounds. 0 offers?&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;In college I applied to hundreds of jobs without knowing LeetCode existed. No projects, no interview experience, nothing to show. I thought I deserved a job and that someone should take a chance on me.&lt;/p&gt;

&lt;p&gt;I had a diploma and a lot of confidence in the wrong things.&lt;/p&gt;

&lt;p&gt;Now I have the experience. A search engine. An arbitrage system. A database. Things I built because I wanted to understand how they work.&lt;/p&gt;

&lt;p&gt;Then again, my inbox is still empty.&lt;/p&gt;

&lt;p&gt;In a few months I'll send another 75 or 80 applications. I'll keep building in the meantime.&lt;/p&gt;

&lt;p&gt;I have no lessons for you. I have no job.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;¹ These numbers come from a quick search across general hiring reports. Gem, Standout-CV, Shortlistd, and others. Software engineering specific data is harder to isolate cleanly and varies enough across sources that you should treat them as directional rather than precise. The picture they paint is accurate even if the exact percentages aren't.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;² I like to imagine this filter as a printer directly dropping applications into a shredder.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;³ Referral hire rate data comes from separate industry sources and is not derived from the funnel math above. The funnel describes a cold application pipeline. Referral numbers are industry-wide averages. Both are directional.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>career</category>
    </item>
    <item>
      <title>Webcrawling is just a brute force algorithm</title>
      <dc:creator>Erich</dc:creator>
      <pubDate>Tue, 27 Jan 2026 23:45:08 +0000</pubDate>
      <link>https://dev.to/h0tb0x/webcrawling-is-just-a-brute-force-algorithm-2meg</link>
      <guid>https://dev.to/h0tb0x/webcrawling-is-just-a-brute-force-algorithm-2meg</guid>
      <description>&lt;p&gt;Every search engine starts with a crawler. Before ranking algorithms, before inverted indexes, before any of the clever stuff, someone has to actually go get the pages. Google, Bing, DuckDuckGo, all of them. Crawlers are where the data comes from.&lt;/p&gt;

&lt;p&gt;The algorithm is brute force BFS. Visit a page, read the rules, download the content, extract the links, add them to the queue. Repeat until you've visited every page on the internet. Then start over, because pages change.&lt;/p&gt;

&lt;p&gt;That's it. No cleverness, no optimization at the conceptual level. You're just walking a graph, one node at a time, until you've seen all of it. The webcrawler's job is completeness, not efficiency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Level 0: The naive crawler
&lt;/h2&gt;

&lt;p&gt;You need an HTTP client, an HTML parser, and a queue. libcurl handles requests. Any HTML parsing library works, or you can write your own parser if you enjoy suffering. The queue just holds URL strings. You also need a set to track visited URLs or you'll loop forever when site A links to site B links back to site A.&lt;/p&gt;

&lt;p&gt;This version works. It also gets your IP banned. A tight loop with no delays will send dozens of requests per second to the same server. Admins notice. Rate limiters trigger. Your crawler either gets blocked or your ISP gets complaints.&lt;/p&gt;

&lt;p&gt;If you run this on a cloud provider, you won't just get banned. You'll also receive a bill that makes you reconsider your career choices.&lt;/p&gt;

&lt;h2&gt;
  
  
  Level 1: Politeness
&lt;/h2&gt;

&lt;p&gt;The internet has a convention for this: robots.txt. Every domain can publish one. Go to the base URL of any site right now and add "/robots.txt" and see what it looks like. It specifies which paths crawlers can access and, more importantly, many seconds to wait between requests.&lt;/p&gt;

&lt;p&gt;Now you need a robots.txt parser and a per-domain delay mechanism. Before each request, check the domain's crawl delay and sleep for that duration.&lt;/p&gt;

&lt;p&gt;The crawler is now polite. It's also slow. A 5-second crawl delay means 12 pages per minute from one domain. Crawling a million pages at that rate takes longer than you want to wait(~57 days).&lt;/p&gt;

&lt;h2&gt;
  
  
  Level 2: Parallelism
&lt;/h2&gt;

&lt;p&gt;The obvious fix is threads. If you're waiting 5 seconds between requests to domain A, you could be fetching from domains B, C, and D in the meantime. Spawn a pool of worker threads, give them a shared queue, and let each one crawl independently.&lt;/p&gt;

&lt;p&gt;This is where most tutorials stop. It's also where you'll get banned again.&lt;/p&gt;

&lt;p&gt;The problem is subtle. Your queue contains thousands of URLs from hundreds of domains, all interleaved. Thread A grabs example.com/page1. Thread B grabs example.com/page2. Thread C grabs example.com/about. Each thread respects the crawl delay individually. It waits 5 seconds after its own last request to that domain. But threads don't know about each other. All three check their own timers, see no recent request, and fire simultaneously. The server sees three requests in the same instant. Your distributed politeness is actually coordinated rudeness.&lt;/p&gt;

&lt;h2&gt;
  
  
  Level 3: Coordinated politeness
&lt;/h2&gt;

&lt;p&gt;The solution is to centralize the coordination. Instead of each worker managing its own timing, you push that responsibility into the frontier itself.&lt;/p&gt;

&lt;p&gt;Workers do three things: they register a domain's required delay (parsed from robots.txt), they mark when they've hit a domain, and they ask for the next URL. The frontier only hands out a URL when that domain's crawl delay has elapsed. Workers don't sleep manually. They just ask for work. The frontier decides when work is available.&lt;/p&gt;

&lt;p&gt;This inverts the control. Workers don't coordinate with each other. They don't even know each other exists. They all talk to the frontier, and the frontier enforces the timing. One mutex, one source of truth, no race conditions between workers checking timestamps simultaneously.&lt;/p&gt;

&lt;p&gt;Now you have a crawler that could index the entire internet. The only obstacles are time, money, storage, compute, and reality.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reality
&lt;/h2&gt;

&lt;p&gt;I've only described the happy path. Real crawlers hit messier problems.&lt;/p&gt;

&lt;p&gt;DNS resolution blocks. Every URL needs a DNS lookup, and those can take seconds. If your threads block on DNS, your parallelism disappears. You either need async DNS or a caching layer.&lt;/p&gt;

&lt;p&gt;Memory pressure builds. A million URLs in a queue takes real memory. A visited set with a million entries takes more. You eventually need to spill to disk or use probabilistic data structures like bloom filters.&lt;/p&gt;

&lt;p&gt;HTML is cursed. Real-world pages have malformed tags, broken encoding, and markup that would make the W3C weep. Your parser will encounter things that technically shouldn't exist. It needs to not crash.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I built
&lt;/h2&gt;

&lt;p&gt;If this was interesting, check out &lt;a href="https://github.com/H0TB0X420/BloomSearch" rel="noopener noreferrer"&gt;BloomSearch&lt;/a&gt;. A search engine I built that uses a crawler like this to index the web.&lt;/p&gt;

</description>
      <category>algorithms</category>
      <category>computerscience</category>
      <category>tutorial</category>
      <category>web</category>
    </item>
    <item>
      <title>The LLM Imposter</title>
      <dc:creator>Erich</dc:creator>
      <pubDate>Wed, 21 Jan 2026 15:24:24 +0000</pubDate>
      <link>https://dev.to/h0tb0x/the-llm-imposter-2072</link>
      <guid>https://dev.to/h0tb0x/the-llm-imposter-2072</guid>
      <description>&lt;p&gt;A few weeks ago I finished a project that actually works. Handles real data, solves a real problem, runs well. I'm proud of it. I'm also... something else. Not ashamed exactly. Just aware of a voice I can't shake: &lt;em&gt;You didn't really do this.&lt;/em&gt; &lt;strong&gt;&lt;em&gt;This doesn't count.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I used LLMs¹ heavily throughout the process. Not vibe coding. I wasn't just prompting "build me a thing" and shipping whatever came out. I made architectural decisions, debugged failures, understood trade-offs. But still. I can't shake that voice.&lt;/p&gt;

&lt;p&gt;There's an image of what a "real programmer" looks like. Someone who writes syntax from memory, who suffered through documentation for years, who earned their skills through late nights and cryptic error messages. The suffering was the point. If you didn't struggle, you didn't learn.&lt;/p&gt;

&lt;p&gt;I internalized that standard somewhere along the way. And by that standard, using an LLM to accelerate past the friction feels like skipping the exam.&lt;/p&gt;

&lt;p&gt;But this isn't the first time the standard changed.&lt;/p&gt;

&lt;p&gt;Every abstraction layer in programming history faced the same resistance. Assembly to C: "You're hiding the machine, you'll never understand what's actually happening." C to managed languages: "Garbage collection? Memory management &lt;em&gt;is&lt;/em&gt; the job." Using libraries: "You're importing code you've never read." And for a decade straight: "You're not a real engineer, you're just copying from Stack Overflow."&lt;/p&gt;

&lt;p&gt;Each time, skeptics said the new way wasn't real programming. Each time, they were defending a standard that was about to become obsolete.&lt;/p&gt;

&lt;p&gt;The abstraction didn't eliminate the need for understanding. You didn't need to manage registers anymore, but you still needed to understand performance. You didn't need to manually free memory, but you still needed to know why your program was leaking.&lt;/p&gt;

&lt;p&gt;The programmers who insisted assembly was the only "real" programming were guarding a gate nobody needed to pass through anymore. Not because they were wrong about assembly being powerful. Because they were wrong about what mattered.&lt;/p&gt;

&lt;p&gt;So what matters this time?&lt;/p&gt;

&lt;p&gt;Code became cheap. Producing working syntax is commoditized now. An LLM can generate a function faster than I can type the signature. Maybe I just type slow.&lt;/p&gt;

&lt;p&gt;But software is still expensive.&lt;/p&gt;

&lt;p&gt;Knowing which components the system needs, how they interact, where it will fail at scale, what trade-offs you're making. None of that got cheaper. The LLM produces parts. It doesn't know which parts matter or where they go.&lt;/p&gt;

&lt;p&gt;Think about what it means to be a mechanic. A parts supplier can hand you a carburetor². That's the easy part. Being a mechanic means knowing where it goes, how it connects to everything else, whether this particular carburetor is right for this particular engine. It means looking at a car that won't start and tracing the problem backward through systems you understand. It means knowing that a failing fuel pump will starve the engine. It means finding out the problem wasn't with the carburetor at all.&lt;/p&gt;

&lt;p&gt;Anyone can order parts. The mechanic knows why the car runs.&lt;/p&gt;

&lt;p&gt;Vibe coding is ordering parts and bolting them on until something happens. Sometimes you get a car. Usually you get an expensive mess that breaks in ways you can't diagnose because you never understood how it was supposed to work in the first place.&lt;/p&gt;

&lt;p&gt;The friction didn't disappear when LLMs arrived. It relocated.&lt;/p&gt;

&lt;p&gt;The old slog was syntax memorization, Stack Overflow archaeology, decoding documentation written by someone who hated you. The new slog is architecture, system design, evaluating outputs, catching "You're absolutely right!" mistakes, knowing when generated code is subtly wrong in ways that won't surface until production.&lt;/p&gt;

&lt;p&gt;Different friction. Still friction. Still earns the outcome.&lt;/p&gt;

&lt;p&gt;I've stopped asking myself "did I use AI to build this?"&lt;/p&gt;

&lt;p&gt;The better question: if this breaks, can &lt;strong&gt;I&lt;/strong&gt; fix it?&lt;/p&gt;

&lt;p&gt;If yes, you built it. The tool you used to get there is irrelevant. If no, you have a pile of parts and a prayer.&lt;/p&gt;

&lt;p&gt;I can debug my project. I can explain why the components exist and what they do. I can extend it, refactor it, reason about its failure modes. The LLM accelerated the syntax production. The engineering was mine.&lt;/p&gt;

&lt;p&gt;That voice is real. But it's grading me on a standard for a game that already changed.&lt;/p&gt;

&lt;p&gt;Besides, I use VS Code. Half the internet already doesn't think I'm a real programmer.&lt;/p&gt;




&lt;p&gt;¹ Large Language Model - it's worth distinguishing from the broader "AI" label.&lt;/p&gt;

&lt;p&gt;² Intentionally obsolete car part.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>discuss</category>
    </item>
    <item>
      <title>HFT-Lite: Prediction market arbitrage engine</title>
      <dc:creator>Erich</dc:creator>
      <pubDate>Sun, 14 Dec 2025 21:58:50 +0000</pubDate>
      <link>https://dev.to/h0tb0x/hft-lite-prediction-market-arbitrage-engine-51lc</link>
      <guid>https://dev.to/h0tb0x/hft-lite-prediction-market-arbitrage-engine-51lc</guid>
      <description>&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;p&gt;The system connects to Kalshi and Interactive Brokers ForecastEx via WebSockets. Events are mapped to a unified symbol config tracking equivalent contracts across platforms. Market data is normalized into a central order book where an arbitrage detector continuously scans for cross-venue mispricings.&lt;/p&gt;

&lt;p&gt;When complementary contracts (YES on one exchange, NO on the other) can be purchased for less than the guaranteed $1.00 settlement minus fees, both trades execute. The outcome doesn't matter, one side always pays.&lt;/p&gt;

&lt;p&gt;Current scope: political and economic events (Fed decisions, presidential nominations, Senate majority control, House majority control).&lt;/p&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;Pure arbitrage opportunities showed up immediately. In 35 minutes of monitoring:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Event&lt;/th&gt;
&lt;th&gt;Kalshi&lt;/th&gt;
&lt;th&gt;IBKR&lt;/th&gt;
&lt;th&gt;Combined&lt;/th&gt;
&lt;th&gt;Net Margin&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SENATE_2026_REP&lt;/td&gt;
&lt;td&gt;YES @ $0.68&lt;/td&gt;
&lt;td&gt;NO @ $0.26&lt;/td&gt;
&lt;td&gt;$0.94&lt;/td&gt;
&lt;td&gt;2.62%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HOUSE_2026_REP&lt;/td&gt;
&lt;td&gt;YES @ $0.27&lt;/td&gt;
&lt;td&gt;NO @ $0.69&lt;/td&gt;
&lt;td&gt;$0.96&lt;/td&gt;
&lt;td&gt;0.62%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SENATE_2026_DEM&lt;/td&gt;
&lt;td&gt;NO @ $0.68&lt;/td&gt;
&lt;td&gt;YES @ $0.28&lt;/td&gt;
&lt;td&gt;$0.96&lt;/td&gt;
&lt;td&gt;0.49%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The catch: these contracts settle February 1, 2027. That's 414 days of capital lock-up for 1-3% return. Treasury bills pay better.&lt;/p&gt;

&lt;h2&gt;
  
  
  Risks
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Leg risk&lt;/strong&gt; is the main concern. If one side fills and the other doesn't, the system rolls back by buying the opposite side on the filled exchange. Loss is limited to fees, but it's still a loss.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Regulatory uncertainty&lt;/strong&gt; hangs over the entire space. Prediction markets occupy gray legal territory. Platforms could face restrictions that impact liquidity or access. That means holding cash and positions on an exchange that suddenly has problems.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;The current margins only make sense with shorter-term contracts. Weekly or daily events reduce capital lock-up and make 1-3% spreads worthwhile.&lt;/p&gt;

&lt;p&gt;The next evolution of this system is comparing options-implied probabilities to prediction market prices. If SPY options imply a 30% probability of closing between $595 and $600, and Kalshi has that bracket at 15 cents, someone's wrong. Retail prediction markets are probably the soft target.&lt;/p&gt;

&lt;p&gt;Other improvements need to be made as well. On the execution side, parallel order placement and Kelly criterion position sizing. On infrastructure, WebSocket reconnection handling and a real-time dashboard. Risk management needs category exposure limits and correlation tracking. Holding ten different contracts with the same outcome isn't diversification.&lt;/p&gt;

&lt;p&gt;Check out the project at the link below. &lt;/p&gt;




&lt;p&gt;&lt;a href="https://github.com/H0TB0X420/HFT-Lite" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
    </item>
    <item>
      <title>One bug, nine errors: what templates actually are</title>
      <dc:creator>Erich</dc:creator>
      <pubDate>Wed, 10 Dec 2025 19:06:58 +0000</pubDate>
      <link>https://dev.to/h0tb0x/one-bug-nine-errors-what-templates-actually-are-7nm</link>
      <guid>https://dev.to/h0tb0x/one-bug-nine-errors-what-templates-actually-are-7nm</guid>
      <description>&lt;p&gt;&lt;em&gt;Part 4 of "You Didn't Learn C++ in College"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I'm four weeks into CMU's 15-445 Database Systems project on B+Trees. My code compiles on the previous commit. I changed one line. The compiler responds with nine identical errors. Same bug, reported nine times, each with a different template instantiation. &lt;/p&gt;

&lt;p&gt;Nine opportunities to learn about templates.&lt;/p&gt;

&lt;h2&gt;
  
  
  The project that broke me
&lt;/h2&gt;

&lt;p&gt;CMU 15-445 has you build a database storage engine from scratch. Project 1 is a buffer pool manager. Project 2 is a B+Tree index that sits on top of it. The B+Tree stores key-value pairs where both the key type and value type are template parameters. Your tree needs to work with 4-byte keys, 8-byte keys, 16-byte keys, 64-byte composite keys. All without writing separate implementations for each.&lt;/p&gt;

&lt;p&gt;The tree class has three template parameters: KeyType, ValueType, and KeyComparator. Every method you write needs to handle any combination. And your B+Tree pages live in the buffer pool as raw memory that you cast into typed nodes. One wrong type and you're debugging memory corruption. No one wants to do that.&lt;/p&gt;

&lt;p&gt;I can't post the implementation (course policy), but the template structure is public. Three type parameters, dozens of methods, all generic over key and value types.&lt;/p&gt;

&lt;h2&gt;
  
  
  What college taught me about templates
&lt;/h2&gt;

&lt;p&gt;My undergraduate C++ course covered templates in maybe two lectures. "Here's &lt;code&gt;vector&amp;lt;int&amp;gt;&lt;/code&gt;, here's &lt;code&gt;vector&amp;lt;string&amp;gt;&lt;/code&gt;, templates let you reuse code." That's pretty much it, two weeks condensed into two sentences.&lt;/p&gt;

&lt;p&gt;I thought templates were syntax sugar. Write one function, use it with different types. The textbooks show &lt;code&gt;template&amp;lt;typename T&amp;gt; T max(T a, T b)&lt;/code&gt; and I assumed the compiler did something clever at runtime to figure out the types.&lt;/p&gt;

&lt;p&gt;I was completely wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  Templates generate code at compile time
&lt;/h2&gt;

&lt;p&gt;When you write &lt;code&gt;BPlusTree&amp;lt;GenericKey&amp;lt;8&amp;gt;, RID, GenericComparator&amp;lt;8&amp;gt;&amp;gt;&lt;/code&gt;, the compiler doesn't create a generic class that handles all types. It generates a completely new class. Specific to those exact types. With its own machine code.&lt;/p&gt;

&lt;p&gt;Two instantiations with different key sizes are two entirely separate classes. The 8-byte version has no idea the 16-byte version exists. They don't share code. They don't share vtables. The compiler literally generates distinct implementations and compiles them independently. C++ templates have no runtime component. By the time your program runs, the templates are gone. They are replaced by concrete, specialized code for each type combination you actually used. &lt;/p&gt;

&lt;h2&gt;
  
  
  One bug, nine errors
&lt;/h2&gt;

&lt;p&gt;Here's an actual error I caused. I passed a &lt;code&gt;page_id_t&lt;/code&gt; (an int) where the function expected an &lt;code&gt;RID&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;error: reference to type 'const bustub::RID' could not bind to 
       an lvalue of type 'bustub::page_id_t' (aka 'int')
    leaf-&amp;gt;SetValueAt(insert_index, cause_error);
                                   ^~~~~~~~~~~
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Clear enough. But the compiler didn't report one error. It reported nine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;note: in instantiation of member function 
  'bustub::BPlusTree&amp;lt;bustub::GenericKey&amp;lt;4&amp;gt;, bustub::RID, 
   bustub::GenericComparator&amp;lt;4&amp;gt;, 0&amp;gt;::InsertWithCrabbing' requested here
template class BPlusTree&amp;lt;GenericKey&amp;lt;4&amp;gt;, RID, GenericComparator&amp;lt;4&amp;gt;&amp;gt;;
               ^
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then the same error for &lt;code&gt;GenericKey&amp;lt;8&amp;gt;&lt;/code&gt;. Then &lt;code&gt;GenericKey&amp;lt;16&amp;gt;&lt;/code&gt;. Then &lt;code&gt;GenericKey&amp;lt;32&amp;gt;&lt;/code&gt;. Then &lt;code&gt;GenericKey&amp;lt;64&amp;gt;&lt;/code&gt;. Plus variants with different fourth template parameters.&lt;/p&gt;

&lt;p&gt;The bottom of the B+Tree implementation file has explicit template instantiations. Lines that tell the compiler: generate complete code for each of these type combinations right here, in this translation unit. The codebase does this so linking works correctly. Template code normally lives in headers, but explicit instantiation lets you put implementations in &lt;code&gt;.cpp&lt;/code&gt; files.&lt;/p&gt;

&lt;p&gt;My one type error existed in a method called &lt;code&gt;InsertWithCrabbing&lt;/code&gt;. The compiler instantiated that method nine times, once per explicit instantiation. Each instantiation hit the same bug. Nine identical errors, each with its own "in instantiation of" note showing which type combination triggered it.&lt;/p&gt;

&lt;p&gt;The error itself was on line 311. The instantiation requests were on lines 1383-1395. A thousand lines apart in the output, connected by template machinery. Once I understood that each "note: in instantiation of" was just the compiler saying "I tried to generate code for this type combination and hit your bug," the error dump became readable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why databases use templates
&lt;/h2&gt;

&lt;p&gt;After staring at enough error messages, the design made sense. Databases need type-specific code without paying for runtime polymorphism.&lt;/p&gt;

&lt;p&gt;Consider the alternative with virtual functions. Every operation requires a virtual function call through a vtable. The compiler can't inline across the indirection. You end up casting &lt;code&gt;void*&lt;/code&gt; everywhere, losing type safety. And if you want to optimize key comparisons for different key sizes, you need runtime branches.&lt;/p&gt;

&lt;p&gt;Templates eliminate all of this. The compiler sees the exact types at compile time. For &lt;code&gt;BPlusTree&amp;lt;GenericKey&amp;lt;8&amp;gt;, RID, GenericComparator&amp;lt;8&amp;gt;&amp;gt;&lt;/code&gt;, it generates code that operates directly on 8-byte keys. No indirection. No vtables. The optimizer can inline the comparator, see through all the abstractions, and generate tight machine code.&lt;/p&gt;

&lt;p&gt;This is what C++ people mean by "zero-overhead abstraction." You write generic code, the compiler generates specialized code. The abstraction costs nothing at runtime because it doesn't exist at runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  Compile-time specialization
&lt;/h2&gt;

&lt;p&gt;The template power in BusTub is straightforward: the compiler generates specialized code for each key size, and all the size calculations happen at compile time.&lt;/p&gt;

&lt;p&gt;The comparator shows how non-type template parameters work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;template&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;KeySize&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GenericComparator&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
 &lt;span class="nl"&gt;public:&lt;/span&gt;
  &lt;span class="kr"&gt;inline&lt;/span&gt; &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="k"&gt;operator&lt;/span&gt;&lt;span class="p"&gt;()(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;GenericKey&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;KeySize&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;lhs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                         &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;GenericKey&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;KeySize&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;rhs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;key_schema_&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;GetColumnCount&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="n"&gt;Value&lt;/span&gt; &lt;span class="n"&gt;lhs_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lhs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ToValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key_schema_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="n"&gt;Value&lt;/span&gt; &lt;span class="n"&gt;rhs_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rhs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ToValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key_schema_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

      &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lhs_value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CompareLessThan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rhs_value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;CmpBool&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;CmpTrue&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lhs_value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CompareGreaterThan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rhs_value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;CmpBool&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;CmpTrue&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
 &lt;span class="k"&gt;private&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;Schema&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;key_schema_&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;KeySize&lt;/code&gt; isn't a type, it's a compile-time constant. &lt;code&gt;GenericComparator&amp;lt;8&amp;gt;&lt;/code&gt; only compares &lt;code&gt;GenericKey&amp;lt;8&amp;gt;&lt;/code&gt; values. Try to compare a &lt;code&gt;GenericKey&amp;lt;16&amp;gt;&lt;/code&gt; and you get a type error at compile time, not a runtime bug. The template parameter acts as a compile-time constraint that prevents mismatched key sizes from ever reaching production.&lt;/p&gt;

&lt;p&gt;This is why the codebase has those explicit instantiations. The database knows it will index columns of certain sizes. Rather than let templates instantiate lazily and potentially bloat binary size with unused combinations, explicit instantiation says: generate exactly these versions, nothing else.&lt;/p&gt;

&lt;h2&gt;
  
  
  What C++20 fixed (and what you'll still encounter)
&lt;/h2&gt;

&lt;p&gt;The 15-445 codebase uses C++17. Modern C++ has concepts, which make template constraints explicit and readable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;template&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;typename&lt;/span&gt; &lt;span class="nc"&gt;T&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="k"&gt;requires&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;integral&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="nf"&gt;square&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the error messages become human-readable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;error: cannot call square with type 'GenericKey&amp;lt;8&amp;gt;'
note: constraints not satisfied: std::integral&amp;lt;GenericKey&amp;lt;8&amp;gt;&amp;gt; evaluated to false
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One line telling you exactly what went wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rust learned from this
&lt;/h2&gt;

&lt;p&gt;Rust calls template instantiation "monomorphization" and builds constraints into the language from day one.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="n"&gt;compare&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Ord&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Ordering&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="nf"&gt;.cmp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;T: Ord&lt;/code&gt; is a trait bound. If you try to call this with a type that doesn't implement &lt;code&gt;Ord&lt;/code&gt;, the error tells you exactly that. No instantiation chain. No nine repeated errors.&lt;/p&gt;

&lt;p&gt;Rust also catches constraint violations where you define the generic function, not where you call it. C++ templates don't check that &lt;code&gt;KeyType&lt;/code&gt; has the methods you need until instantiation. Rust checks the trait bounds immediately. This is what happens when language designers learn from 30 years of C++ template errors.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I actually learned
&lt;/h2&gt;

&lt;p&gt;The B+Tree project took me a month. I'm currently on a break from it because implementing concurrent access with latching broke my brain. But debugging template errors taught me something I never got from college.&lt;/p&gt;

&lt;p&gt;Templates are a code generation system. When you write a template, you're writing instructions for the compiler to follow when it generates real code. Each instantiation creates a new, specialized version. The error messages are repetitive because the compiler hits your bug once per instantiation. Understanding this changes how you debug. &lt;/p&gt;

&lt;p&gt;The B+Tree uses templates because database indexes need to work with arbitrary key types while generating optimal code for each one. Virtual functions would add overhead on every comparison, every key copy, every node traversal. Templates let you write the generic algorithm once and get specialized assembly for each key type. The compile-time pain is the price for runtime performance.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next: Part 5 covers move semantics&lt;/em&gt;&lt;/p&gt;

</description>
      <category>cpp</category>
      <category>programming</category>
    </item>
    <item>
      <title>Smart pointers: memory safety without garbage collection</title>
      <dc:creator>Erich</dc:creator>
      <pubDate>Mon, 24 Nov 2025 20:07:39 +0000</pubDate>
      <link>https://dev.to/h0tb0x/smart-pointers-memory-safety-without-garbage-collection-5674</link>
      <guid>https://dev.to/h0tb0x/smart-pointers-memory-safety-without-garbage-collection-5674</guid>
      <description>&lt;p&gt;&lt;em&gt;Part 3 of "You Didn't Learn C++ in College"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I'm building a web crawler to learn C++ and understand how search engines work. Not a toy project that crawls ten pages and calls it done, but something that needs to run for hours, handle thousands of URLs, and not explode. This means dealing with the reality that every college programming project conveniently ignores: programs that actually stay running.&lt;/p&gt;

&lt;p&gt;My college data structures course taught &lt;code&gt;new&lt;/code&gt; and &lt;code&gt;delete&lt;/code&gt;, then handed us assignments that ran for 30 seconds and exited. Memory leaks? Dangling pointers? "Just be careful" was the advice. The assignments ended before the leaks mattered. Those short-lived programs never exposed the problems with manual memory management.&lt;/p&gt;

&lt;p&gt;A web crawler runs for hours and processes thousands of documents. Miss a single &lt;code&gt;delete&lt;/code&gt; in an error path, and you leak memory on every failed HTTP request. Forget to clean up when a parsing exception gets thrown, and memory usage climbs until the system kills your process. Delete the same object twice because two threads finished at the same time, and the program crashes with a memory corruption error that's nearly impossible to debug.&lt;/p&gt;

&lt;p&gt;Raw pointers and manual &lt;code&gt;delete&lt;/code&gt; calls don't scale to long-running programs. So for this crawler, I'm using smart pointers from the start. They're RAII applied to memory management, and they make the whole "be careful" thing obsolete.&lt;/p&gt;

&lt;h2&gt;
  
  
  What smart pointers actually are
&lt;/h2&gt;

&lt;p&gt;A smart pointer is a class that wraps a raw pointer and manages its lifetime. When the smart pointer goes out of scope, it automatically deletes the object it owns. The destructor does the cleanup. Every exit path, every exception, every early return, the object gets deleted exactly once.&lt;/p&gt;

&lt;p&gt;C++ provides three types in the standard library: &lt;code&gt;unique_ptr&lt;/code&gt; for single ownership, &lt;code&gt;shared_ptr&lt;/code&gt; for shared ownership with reference counting, and &lt;code&gt;weak_ptr&lt;/code&gt; for non-owning observation. Each solves different ownership patterns.&lt;/p&gt;

&lt;p&gt;In the crawler, every HTTP response will need a parser. The crawler creates the parser, uses it to extract links and content, then should destroy it. With raw pointers, I'd need &lt;code&gt;delete&lt;/code&gt; calls after normal completion, after parsing errors, after network timeouts, after receiving invalid HTML. Miss one path and memory leaks.&lt;/p&gt;

&lt;p&gt;With &lt;code&gt;unique_ptr&lt;/code&gt;, the parser gets deleted automatically when I'm done with it. The function will create a parser using &lt;code&gt;make_unique&lt;/code&gt;, fetch HTML content, and process it. If the fetch returns empty, the function returns early and the parser destructor runs automatically. If parsing throws an exception, the stack unwinds and the parser destructor runs. On normal completion, the function ends and the parser destructor runs. Every path works correctly without manual cleanup scattered everywhere.&lt;/p&gt;

&lt;h2&gt;
  
  
  unique_ptr: single ownership
&lt;/h2&gt;

&lt;p&gt;A &lt;code&gt;unique_ptr&lt;/code&gt; owns exactly one object and cannot be copied, only moved. This transfers ownership explicitly. When the &lt;code&gt;unique_ptr&lt;/code&gt; goes out of scope or gets reset, it calls &lt;code&gt;delete&lt;/code&gt; on the object it owns.&lt;/p&gt;

&lt;p&gt;The performance cost is zero. A &lt;code&gt;unique_ptr&amp;lt;T&amp;gt;&lt;/code&gt; compiles to the exact same assembly as a raw &lt;code&gt;T*&lt;/code&gt; pointer. The compiler optimizes away the wrapper completely. You get automatic memory management at no runtime cost.&lt;/p&gt;

&lt;p&gt;The crawler's URL queue will use this pattern. Each URL gets fetched exactly once, and one component owns that work. The queue will store crawl tasks wrapped in &lt;code&gt;unique_ptr&lt;/code&gt;. Each task contains a URL, a depth counter for limiting how deep the crawler goes, and the logic to fetch and process that URL. When I need to process a task, I'll pop it from the queue by moving ownership out. The queue no longer owns it, the processing function now owns it. When processing completes, the task goes out of scope and gets deleted automatically.&lt;/p&gt;

&lt;p&gt;This prevents the bug where the queue thinks it still owns the task and tries to delete it while another thread is using it. Move semantics make this impossible. Once ownership transfers out, the queue has nothing. It can't accidentally delete something it no longer owns. The compiler enforces this. Try to copy a &lt;code&gt;unique_ptr&lt;/code&gt; and the code won't compile.&lt;/p&gt;

&lt;p&gt;The type system documents who's responsible for cleanup. The queue owns tasks, processing borrows them temporarily. No ambiguity about whose job it is to call &lt;code&gt;delete&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  shared_ptr: when you need shared ownership
&lt;/h2&gt;

&lt;p&gt;The crawler will maintain a cache of parsed robots.txt files. Multiple URLs from the same domain need to check the same robots.txt. The cache owns these files, but active crawl tasks also need access to them. The file shouldn't be deleted until both the cache evicts it and all tasks using it complete.&lt;/p&gt;

&lt;p&gt;This needs shared ownership. Multiple &lt;code&gt;shared_ptr&lt;/code&gt; instances can point to the same object. A reference count tracks how many owners exist. When a new &lt;code&gt;shared_ptr&lt;/code&gt; copies from an existing one, the reference count increments. When a &lt;code&gt;shared_ptr&lt;/code&gt; gets destroyed, the reference count decrements. When the count hits zero, the last &lt;code&gt;shared_ptr&lt;/code&gt; deletes the object.&lt;/p&gt;

&lt;p&gt;The cache will store robots.txt files as &lt;code&gt;shared_ptr&lt;/code&gt;. When a task needs to check if a URL is allowed, it asks the cache for that domain's robots.txt. The cache returns a copy of the &lt;code&gt;shared_ptr&lt;/code&gt;, incrementing the reference count. Now both the cache and the task own the robots.txt. If the cache decides to evict that entry to save memory, it can delete its copy of the &lt;code&gt;shared_ptr&lt;/code&gt;. The reference count decrements but doesn't hit zero because the task still owns a copy. The robots.txt stays alive. When the task finishes and its &lt;code&gt;shared_ptr&lt;/code&gt; gets destroyed, the reference count hits zero and the robots.txt gets deleted.&lt;/p&gt;

&lt;p&gt;No dangling pointers. No use-after-free bugs. The task can safely use the robots.txt even after the cache evicted it.&lt;/p&gt;

&lt;p&gt;The cost is real though. Each &lt;code&gt;shared_ptr&lt;/code&gt; stores two pointers: one to the object and one to a control block that holds the reference count. That's 16 bytes on a 64-bit system instead of 8 bytes for a raw pointer. Incrementing and decrementing the reference count uses atomic operations for thread safety. Atomic operations are significantly slower than regular integer operations because they need to coordinate across CPU cores. Creating a &lt;code&gt;shared_ptr&lt;/code&gt; with the naive approach allocates memory twice: once for the object and once for the control block.&lt;/p&gt;

&lt;p&gt;Use &lt;code&gt;make_shared&lt;/code&gt; to fix the double allocation problem. It allocates the object and control block in one contiguous chunk, cutting allocation overhead in half and improving cache locality since the object and its metadata sit next to each other in memory.&lt;/p&gt;

&lt;p&gt;Don't default to &lt;code&gt;shared_ptr&lt;/code&gt; because it seems easier than thinking about ownership. Shared ownership makes reasoning about lifetimes harder. When ten different components all own something, figuring out when it actually gets deleted requires tracking all ten owners. Use &lt;code&gt;shared_ptr&lt;/code&gt; only when you actually need multiple owners, like caches where clients need to keep using objects even after eviction, callbacks that outlive the code that registered them, or async operations where multiple threads need access to shared state.&lt;/p&gt;

&lt;h2&gt;
  
  
  weak_ptr: breaking cycles
&lt;/h2&gt;

&lt;p&gt;The crawler will represent the web as a graph of pages. Each page object stores its URL, parsed content, and references to other pages it links to. If I use &lt;code&gt;shared_ptr&lt;/code&gt; for these outbound links, I create circular references. Page A links to Page B, which links back to Page A. Both hold &lt;code&gt;shared_ptr&lt;/code&gt;s to each other. The reference counts never hit zero. Memory leaks despite using smart pointers.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;weak_ptr&lt;/code&gt; solves this. It holds a non-owning reference to an object managed by &lt;code&gt;shared_ptr&lt;/code&gt;. It doesn't increment the reference count. The object can be deleted while &lt;code&gt;weak_ptr&lt;/code&gt;s still reference it. Before using a &lt;code&gt;weak_ptr&lt;/code&gt;, convert it to a temporary &lt;code&gt;shared_ptr&lt;/code&gt; by calling &lt;code&gt;lock()&lt;/code&gt;. This returns an empty &lt;code&gt;shared_ptr&lt;/code&gt; if the object was already deleted, or a valid &lt;code&gt;shared_ptr&lt;/code&gt; if it still exists.&lt;/p&gt;

&lt;p&gt;The page cache will own pages with &lt;code&gt;shared_ptr&lt;/code&gt;. When I add a link from one page to another, the source page stores a &lt;code&gt;weak_ptr&lt;/code&gt; to the target. The target's reference count doesn't increase. When the cache evicts the target page, that page gets deleted even though other pages still reference it. The &lt;code&gt;weak_ptr&lt;/code&gt;s don't keep it alive.&lt;/p&gt;

&lt;p&gt;When I need to traverse the graph and visit all pages a given page links to, I'll iterate through its &lt;code&gt;weak_ptr&lt;/code&gt; list and call &lt;code&gt;lock()&lt;/code&gt; on each one. If the target page still exists, &lt;code&gt;lock()&lt;/code&gt; returns a valid &lt;code&gt;shared_ptr&lt;/code&gt; and I can access the URL. If the target was deleted, &lt;code&gt;lock()&lt;/code&gt; returns empty and I skip it. The code handles missing pages gracefully without crashes or undefined behavior.&lt;/p&gt;

&lt;p&gt;This pattern shows up everywhere in large systems. Parent-child relationships use it: parents own children with &lt;code&gt;shared_ptr&lt;/code&gt;, children reference parents with &lt;code&gt;weak_ptr&lt;/code&gt;. Otherwise parents and children would keep each other alive forever. Observer patterns use it: the subject being observed is owned elsewhere, observers hold &lt;code&gt;weak_ptr&lt;/code&gt; so they don't prevent the subject from being deleted. Caches use it: the cache uses &lt;code&gt;shared_ptr&lt;/code&gt; for ownership, clients get &lt;code&gt;weak_ptr&lt;/code&gt; so they can access objects but don't prevent eviction.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why raw pointers still exist
&lt;/h2&gt;

&lt;p&gt;Raw pointers aren't gone. They're for non-owning references within a limited scope. When a function takes a parameter it doesn't own and won't outlive the call, use a raw pointer or reference.&lt;/p&gt;

&lt;p&gt;The crawler will have a function that processes HTML given a parser. The function doesn't own the parser and doesn't need to keep it alive. It just needs to use it during the function call. Passing a raw pointer or reference is perfect here. The caller owns the parser, the processing function borrows it. When processing completes, the parser goes back to being owned by the caller.&lt;/p&gt;

&lt;p&gt;The rule became: smart pointers for ownership, raw pointers for borrowing. The type system documents who's responsible for cleanup. A function taking &lt;code&gt;unique_ptr&lt;/code&gt; by value takes ownership. A function taking &lt;code&gt;shared_ptr&lt;/code&gt; by value shares ownership. A function taking a raw pointer borrows without ownership. You can see the memory management contract in the function signature.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this changes
&lt;/h2&gt;

&lt;p&gt;Smart pointers make ownership explicit in the type system. The cache will use &lt;code&gt;shared_ptr&lt;/code&gt; because multiple systems need access and it's unclear who finishes last. Tasks will use &lt;code&gt;unique_ptr&lt;/code&gt; because they have clear single owners. Links will use &lt;code&gt;weak_ptr&lt;/code&gt; to avoid cycles. The code will say what it does through the types instead of through comments and developer discipline.&lt;/p&gt;

&lt;p&gt;This approach showed up in Rust as the entire language design. Every type has ownership semantics enforced at compile time. You can't compile code that would cause a use-after-free. You can't accidentally create circular references. The borrow checker rejects programs with ambiguous ownership. C++ made smart pointers optional, letting you choose between manual memory management and automatic cleanup. Rust made ownership tracking mandatory, moving all these bugs from runtime to compile time.&lt;/p&gt;

&lt;p&gt;Go went the opposite direction and chose garbage collection. Memory management happens automatically at runtime through a concurrent mark-and-sweep collector. No ownership tracking needed. No thinking about when objects get deleted. You pay for this with GC pauses where the program stops to clean up memory, and less control over when cleanup actually happens. Each language learned from C++'s complexity and made different trade-offs based on their priorities.&lt;/p&gt;

&lt;p&gt;In modern C++, if you're writing &lt;code&gt;new&lt;/code&gt; and &lt;code&gt;delete&lt;/code&gt; by hand, you're writing C++98. The language moved on two decades ago. Use &lt;code&gt;make_unique&lt;/code&gt; for single ownership, &lt;code&gt;make_shared&lt;/code&gt; when you need multiple owners, and &lt;code&gt;weak_ptr&lt;/code&gt; to observe without owning. The ownership model becomes clear in the code instead of existing only in comments and documentation. The compiler handles the cleanup, and you get zero-overhead abstractions that cost nothing at runtime.&lt;/p&gt;

&lt;p&gt;The crawler isn't built yet, but the design decisions are already clear. Smart pointers make the ownership explicit before I write the implementation. College taught "be careful" with raw pointers. Modern C++ provides actual tools instead of advice.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next: Templates - Why C++ compiles so slowly&lt;/em&gt;&lt;/p&gt;

</description>
      <category>cpp</category>
    </item>
  </channel>
</rss>
