DEV Community: kol kol

I Deleted 40,000 Lines of "Dead Code" — Production Broke in 3 Minutes

kol kol — Sat, 04 Jul 2026 14:02:49 +0000

I Deleted 40,000 Lines of "Dead Code" — Production Broke in 3 Minutes

We all hate dead code. It's the junk drawer of your codebase — nobody knows what it does, nobody's touched it in years, and it's just sitting there taking up space.

So when I ran our code coverage tool and it showed 40,000 lines with zero references, I felt like a hero. I was about to clean up this mess. The PR description was literally: "Housekeeping — removing unused code."

I was so wrong.

The Setup

Our codebase had grown organically over 5 years. Multiple teams, multiple rewrites, at least two "we'll clean it up later" phases that never happened.

The coverage report was clear: these functions, classes, and modules had zero callers. Zero imports. Zero references. The tool even showed me the exact files. I spent maybe 20 minutes verifying — clicked through a few call chains, searched for dynamic references. Nothing.

I opened the PR. One reviewer approved in 5 minutes. The other didn't even look. We merged on a Thursday at 4 PM. Classic.

What Happened Next

Minute 1-3: The Silence

Deploy went green. All tests passed. No alerts fired. I closed my laptop feeling productive.

Minute 4: The First Page

Slack notification: "Checkout is failing."

Not "checkout is slow." Not "checkout is weird." Failing. As in, customers couldn't buy things.

Minute 10: The Investigation

I looked at the error logs. The stack trace pointed to a file I had just deleted. But that's impossible — the coverage tool said nothing called it.

Then I found it.

The Problem: Dynamic References

One of our "legacy" payment integrations used eval() — yes, eval — to dynamically construct payment processor class names from a configuration database.

# config table had: "processor_class": "StripeLegacyProcessor"
processor = eval(f"{config.processor_class}(api_key)")

The coverage tool couldn't see it because the reference was a string in the database, not code. The static analysis had no way to know that "StripeLegacyProcessor" in a database row meant from legacy_payments import StripeLegacyProcessor.

But wait — it gets worse.

The Hidden Web

That one eval() was just the tip of the iceberg. Once I started searching for all dynamic references, I found:

Database-driven feature routing — feature flags stored as class names, resolved at runtime
Plugin system — a YAML file listed "active plugins" by class name
Admin dashboard — dynamically loaded report generators based on user permissions
Webhook handlers — URL paths mapped to handler classes via a JSON config

Each one was a string reference that my coverage tool couldn't see. Each one broke when I deleted the "dead" code.

The Fix

I reverted the entire PR in about 2 minutes. But the damage was done — we had maybe 15 minutes of checkout downtime, and I had to explain to the CTO why I'd broken the revenue pipeline for a "housekeeping" PR.

What I Learned

1. Coverage Tools Lie (About Dynamically Referenced Code)

Static analysis can only see static references. If your codebase uses any form of dynamic loading — reflection, eval, metaprogramming, configuration-driven instantiation — your coverage report is incomplete.

2. "Dead Code" Is Often "Code That Calls Itself Indirectly"

Before deleting anything, I should have:

Searched for string literals matching class/function names
Checked configuration files, database seeds, and migration scripts
Looked for getattr(), eval(), exec(), importlib.import_module()
Asked the team that originally wrote the code

3. The Real Dead Code Test Is Runtime, Not Static

A better approach:

Instrument the code — add logging to every "suspected dead" function
Wait — run in production for at least one full business cycle
Verify — only delete functions that logged zero calls over a meaningful period
Feature flag it first — gate the code behind a flag, disable the flag, watch for breakage

4. PR Culture Matters

My reviewers approved a 40,000-line deletion in minutes. That's a process failure. Large deletions should get the same scrutiny as large additions — maybe more, because the risk is invisible.

The Aftermath

We eventually cleaned up that codebase — but properly. We added runtime instrumentation, waited two weeks, identified the actually dead code (about 8,000 lines), and deleted it in small, verified batches.

The 32,000 lines that looked dead? They were all referenced dynamically. Every single one.

The Takeaway

Dead code is like a haunted house — it looks empty, but something's still living in there. Before you start demo lishing walls, make sure nobody's home.

Next time I see "40,000 lines of unused code," I'm going to assume I just don't understand the codebase well enough yet.

Have you ever deleted code that wasn't actually dead? Share your war stories in the comments.

My CI/CD Pipeline Passed for 3 Months — Then I Read the Logs

kol kol — Fri, 03 Jul 2026 14:07:21 +0000

My CI/CD Pipeline Passed for 3 Months — Then I Read the Logs

The green checkmark was our favorite color. Every PR. Every merge. Every deploy. All green.

Then one Tuesday afternoon, a user reported a feature that had been broken for weeks. Not hours. Weeks.

I opened the CI pipeline logs. And that's when I realized — our "passing" builds had been lying to us the entire time.

The Setup

Our pipeline looked textbook:

Lint → 2. Unit tests → 3. Integration tests → 4. Build → 5. Deploy

Every step passed. 100% success rate. For months.

The Discovery

A user filed a bug: "The export button doesn't work." I clicked it myself — nothing happened. No error message, no crash. Just... silence.

I traced it back to a PR from 11 weeks ago. The pipeline had passed. The code review had been approved. But the feature had been silently broken since day one.

Here's what went wrong — and it's not what you'd expect.

The Problem: Tests That Don't Test

Our integration test suite had a bug. A real bug, in the test code itself.

// What we thought we were testing:
describe('Export feature', () => {
  it('should export user data', async () => {
    const result = await exportUserData(userId);
    expect(result.status).toBe('success');
  });
});

// What was actually happening:
// The test mocked the ENTIRE export service.
// So it tested the mock, not the real code.
// The mock always returned { status: 'success' }.

The mock was configured at the module level. Every test that imported the export module got the mocked version — including the integration tests that were supposed to catch exactly this kind of bug.

The Root Cause: Over-Mocking

We had fallen into the classic over-mocking trap:

Unit tests: Mocked everything to isolate the unit → fine
Integration tests: Also mocked everything "for speed" → not fine
E2E tests: Didn't cover this specific flow → gap in coverage

Our integration tests were really just expensive unit tests. They verified that our mocks worked correctly. Not that our actual code worked.

The Fix

Three changes that made a real difference:

1. Mock Boundaries

Only mock at unit test level. Integration tests must hit real service boundaries — databases, APIs, file systems. If it's slow, that's a signal, not a problem to hide.

2. Contract Tests

Added contract tests between services. If the mock returns something the real service wouldn't, the contract test catches it.

// Contract test: verify mock matches real behavior
it('export mock matches real service contract', async () => {
  const mockResult = await mockExport(userId);
  const realResult = await realExport(userId);

  expect(Object.keys(mockResult)).toEqual(Object.keys(realResult));
  expect(typeof mockResult.data).toEqual(typeof realResult.data);
});

3. Pipeline Health Metrics

Started tracking not just pass/fail rates, but what the tests actually exercised. Coverage numbers went down initially (from 94% to 67% real coverage) — and that was the most honest metric we'd had in months.

The Real Lesson

A green pipeline doesn't mean your code works. It means your tests passed. Those are two different things.

The most dangerous bugs aren't the ones that break your pipeline. They're the ones your pipeline says are fine.

If your CI/CD is all green all the time, ask yourself: are my tests catching bugs, or just confirming that my mocks are consistent?

Questions I Ask Myself Now

When was the last time a test failed for a real bug?
Do my integration tests actually integrate?
If I removed all my mocks, how many tests would still pass?
Am I measuring test coverage or test confidence?

A pipeline that never fails isn't reliable. It's untested.

My Monitoring Dashboard Was All Green — While 80% of Users Got Errors

kol kol — Wed, 01 Jul 2026 14:04:45 +0000

My Monitoring Dashboard Was All Green — While 80% of Users Got Errors

Everything looked perfect. Response times? Under 200ms. Error rate? 0.3%. CPU? 12%. Memory? Fine. Every metric on the dashboard was painted in soothing green.

And yet, our support inbox was filling up with "the app isn't working" messages.

Here's what happened, how I found it, and the monitoring lesson that changed how I think about observability.

The Problem

It was a Tuesday. Our app had a feature that processed user-uploaded documents — convert, OCR, store. Simple pipeline. The monitoring dashboard showed everything healthy.

But users were complaining. Not loudly. Just a steady trickle of "my document didn't process" tickets.

I checked the dashboard again. All green.

The Investigation

I dug into the raw logs — not the aggregated metrics, but the actual request-level logs. And that's when I saw it:

20% of requests went through the main service → responded normally
80% of requests hit a load balancer rule I'd added weeks ago for "capacity management" → got silently routed to a staging queue that... nobody was consuming from

The requests weren't failing. They were being redirected to a dead end. No errors, no timeouts. Just... nothing.

Our monitoring was tracking:

✅ Response time (of the 20% that completed)
✅ Error rate (there were none — the requests just vanished)
✅ CPU/memory (the main service was barely working, because 80% of traffic was going elsewhere)

The dashboard was green because it was only measuring the healthy path.

The Root Cause

Three weeks earlier, I'd added a load balancer rule to route "overflow" traffic to a secondary processing queue during peak loads. The idea: prevent the main service from crashing under heavy document uploads.

The rule worked. It routed 80% of traffic to the secondary queue.

Nobody ever set up a consumer for the secondary queue.

The traffic wasn't failing. It was being politely escorted to a room with no exit. And our monitoring, which only tracked the main service, had no idea.

The Fix

Disabled the load balancer rule (immediate fix)
Set up a consumer for the secondary queue (proper fix — now both paths are monitored)
Added queue-depth monitoring (so we catch "traffic going somewhere but not being consumed" scenarios)
Created a "silent failure" runbook (a checklist for when metrics look fine but users report problems)

The Real Lesson

Green dashboards don't mean healthy systems. They mean your dashboards are measuring the things you told them to measure.

The things you didn't tell them to measure? Those are the things that will quietly break everything.

Here's my new rule: Every routing decision needs monitoring on both ends. If you send traffic somewhere, you need to know if it arrives and if it gets processed.

A routing rule without a corresponding monitor isn't just incomplete. It's dangerous. It gives you false confidence — the worst kind of confidence in production.

My "Silent Failure" Checklist

When users report problems but your dashboard is green:

[ ] Check raw logs, not just aggregated metrics
[ ] Trace a single request end-to-end (not just the happy path)
[ ] Look for traffic going somewhere unexpected (new routes, old rules, load balancer configs)
[ ] Check queue depths and consumer lag (requests might be waiting, not failing)
[ ] Ask: "What would success look like if this was broken in a way my dashboard couldn't see?"

The answer to that last question is usually the bug.

Your metrics are only as good as your imagination. If you can't imagine how something could silently fail, you won't monitor it. And if you don't monitor it, it will fail — quietly, confidently, while your dashboard stays green.

What's your "green dashboard but everything is broken" story?

I Thought It Was a "Quick Fix" — That 15-Minute Change Cost Me 3 Days of Debugging

kol kol — Thu, 25 Jun 2026 22:11:00 +0000

We've all said it. We've all believed it.

"It's just one line. Should take 15 minutes, max."

Here's the story of the time I was wrong — and the checklist I now use before touching anything in production.

The Setup

It was Thursday afternoon. A support ticket came in: users on the free tier weren't seeing their usage stats. The paid tier worked fine. The dashboard looked broken for a chunk of our user base.

I found the offending code in about 5 minutes:

// Before
const usage = await prisma.usage.findMany({
  where: { userId, tier: 'free' }
});

The issue was obvious — we'd recently migrated free users to a new tier label. The query was looking for tier: 'free' but the new value was 'basic'.

One-line fix. I typed it, ran the tests, and pushed.

// After (or so I thought)
const usage = await prisma.usage.findMany({
  where: { userId, tier: { in: ['free', 'basic'] } }
});

Tests passed. Deployed. Done. 15 minutes, start to finish.

I was so proud of myself.

The Crack Appears

By Friday morning, the dashboard was worse. Not just missing stats — now free tier users were seeing negative usage numbers.

Negative. Usage.

"How?" was my first question. "Why?" was my second. "How did nobody catch this?" was my third, directed at myself.

The Investigation

I spent Friday chasing ghosts:

Is it a Prisma bug? — No. Raw SQL queries showed the same numbers.
Is it a data corruption issue? — No. The raw data was clean.
Is it a timezone problem? — I wasted 2 hours on this. It wasn't.
Is the aggregation logic wrong? — Getting warmer...

Here's what I eventually found. The change I made — adding in: ['free', 'basic'] — didn't just change which records got fetched. It changed how many records got fetched.

The old code queried one table. But basic tier users had their usage data split across two tables (a migration artifact we never cleaned up). My "one-line fix" was now double-counting usage by pulling from both.

The negative numbers? A subtraction later in the pipeline that assumed single-table data. Double the input → subtraction overflowed into negatives.

The Root Cause

My fix treated a symptom (wrong tier label) as the whole problem. The real issue was:

We had a half-finished data migration (two tables for one concept)
The code had implicit assumptions about data shape
My tests only covered the happy path with seeded data
Nobody (including me) understood the full data flow

The one-line change was syntactically correct. It was semantically wrong.

What I Learned

1. "Quick fixes" are the most dangerous kind

When something feels trivial, that's exactly when you need to slow down. The pressure to "just fix it fast" is what causes production incidents.

2. Tests don't catch what they don't know about

My unit tests passed because they used clean, seeded data. They had no idea about the migration artifact in production. I now add this to my pre-deploy checklist:

□ Does this change touch migrated data?
□ Are there duplicate/legacy data sources?
□ What assumptions does downstream code make?
□ Have I tested with production-like data?

3. Read the code around the change, not just the change

I should have asked: "What happens to this data after it's fetched?" Instead, I looked at the one line, fixed it, and moved on.

Now I trace the data flow at least two steps upstream and downstream before making any change.

4. Write down what you changed and why

When you're deep in debugging at 2 AM three days later, you need context. A commit message like "fix tier label" is useless. "Update tier query to include 'basic' users — note: basic tier has legacy data in usage_v2 table" would have saved me hours.

The Real Fix

The actual solution wasn't a one-line change. It was:

Consolidate the data — Write a migration script to merge the two tables
Update the query — Point it at the consolidated source
Add a validation test — With production-like data distribution
Document the migration — So the next person knows why things look the way they do

It took a full day. But it was the right fix, not the fast one.

My New "Quick Fix" Checklist

Before I touch anything in production now, I run through this:

[ ] What changed upstream? (What feeds this code?)
[ ] What changed downstream? (What consumes this output?)
[ ] What assumptions am I making? (Write them down, verify each)
[ ] Are there edge cases in production data that tests don't cover?
[ ] Would someone else understand this change from the commit message?

It adds 5 minutes to every fix. And it's saved me from at least three more multi-day debugging sessions since then.

The cheapest bugs are the ones you prevent. The most expensive ones start with "it's just a quick fix."

What's your worst "quick fix" story? I'd love to hear I'm not alone in this. 🙃

I Swallowed My Errors for 6 Months — Then My Users Found Every Single One

kol kol — Thu, 25 Jun 2026 14:07:35 +0000

I spent 6 months thinking my error handling was "good enough."

I had try/catch blocks everywhere. I logged to the console. I even had a fancy Sentry dashboard.

But my users were still hitting broken flows — uploading files that vanished, payments that went through without confirmation, form submissions that silently failed.

The problem wasn't that I wasn't catching errors. It was that I was catching them and doing nothing.

The Silent Failure Pattern

Here's what my code looked like:

try {
  await uploadFile(file);
  await notifyUser("Upload complete");
} catch (error) {
  console.error("Upload failed:", error);
  // TODO: handle this better
}

That // TODO comment? It lived there for 6 months.

Meanwhile, every time a file upload failed:

The user saw a loading spinner that never resolved
Their file was never uploaded
They had no idea something went wrong
They tried again. And again. And gave up.

The Numbers That Woke Me Up

I finally dug into our analytics after a customer complained. Here's what I found:

23% of file uploads failed silently — roughly 1 in 4 uploads just disappeared
Average retry count: 2.7 — users tried almost 3 times before giving up
Zero error alerts — our Sentry dashboard showed "healthy" because we caught all the errors

We were running a silent disaster. Our dashboard was green, our users were bleeding.

The Fix Was Simple (But Not Easy)

I replaced every silent catch with this pattern:

try {
  await uploadFile(file);
  showSuccess("Upload complete!");
} catch (error) {
  if (error instanceof NetworkError) {
    showRetryDialog(file);
  } else if (error instanceof PermissionError) {
    showPermissionGuide();
  } else {
  showError("Something went wrong. Please try again.");
    logError(error, { context: "file-upload", userId });
  }
}

Key principles:

Never catch and do nothing — at minimum, log it with context
Tell the user — "something went wrong" is better than infinite loading
Offer a path forward — retry, contact support, try a different format
Categorize by error type — network issues need different handling than permission issues

The Results

After rolling this out:

Silent upload failures dropped from 23% to 0.4%
User-reported upload issues dropped 89%
Support tickets about "missing files" went from 15/week to 1/week

But the biggest win wasn't the numbers. It was that when something did go wrong, our team knew about it immediately instead of waiting for an angry email.

The Real Lesson

Error handling isn't about making your code not crash. It's about making failures visible — to users and to your team.

A crash is honest. A silent failure is a lie.

Every time you write catch (error) { console.log(error) } and move on, you're choosing to hide problems instead of solving them.

The next time you catch an error, ask yourself: "If this fails, who needs to know — and what should they do about it?"

If the answer is "nobody," you probably have a bigger problem than a missing error handler.

What's the worst silent failure you've found in production? Drop it in the comments — I want to feel less alone.

I Spent $500 on RAG Infrastructure Before Realizing These 7 Mistakes Were Killing My Results

kol kol — Sat, 20 Jun 2026 22:08:38 +0000

I Spent $500 on RAG Infrastructure Before Realizing These 7 Mistakes Were Killing My Results

I built a RAG pipeline for private document search. It cost me $500 in vector database compute, weeks of debugging, and a lot of frustration. The results were mediocre — users got irrelevant answers, queries were slow, and the whole thing felt like a fancy keyword search with extra steps.

Then I audited the pipeline step by step. Turns out, I made 7 mistakes that are incredibly common in RAG systems. Fixing them transformed the pipeline from "meh" to genuinely useful.

Here's what I got wrong, and what I changed.

Mistake #1: I Chopped Documents Into Random Pieces

I was splitting documents by fixed token count — 512 tokens per chunk, done. Simple, right?

Wrong. I was destroying semantic context. A paragraph about API authentication would get split mid-sentence, with half in one chunk and half in another. When retrieval ran, the LLM got fragmented context and produced garbage.

The fix: Parent-Document retrieval with semantic chunking.

Split by natural document boundaries first (paragraphs, sections, headers) — these are your "parent documents"
Create smaller child chunks from parents for vector search
When a child chunk matches, return the full parent document to the LLM
Add 10-20% overlap between chunks so boundary information isn't lost

# What I should have done from the start
CHUNK_CONFIG = {
    "chunk_size": 1000,
    "chunk_overlap": 200,
    "separator": ["\n\n", "\n", "。", "！", "？"],
}

Query accuracy jumped 30% after this one change.

Mistake #2: I Used 0.5:0.5 Weights for Hybrid Search

My vector database supports hybrid search — combining vector similarity with keyword (BM25) matching. I left the weights at the default 50/50 split and assumed that was fine.

It wasn't. For technical documentation, exact keyword matches matter way more than the default acknowledges. Someone searching for "HNSW ef_construction" needs that exact term, not a semantically similar but wrong answer.

The fix: Dynamic weights based on query type.

Factual queries ("what is X"): 35% vector, 65% keyword
Semantic queries ("how do I build X"): 75% vector, 25% keyword
General queries: 60% vector, 40% keyword

WEIGHTS = {
    "factual": {"vector": 0.35, "keyword": 0.65},
    "semantic": {"vector": 0.75, "keyword": 0.25},
    "general": {"vector": 0.6, "keyword": 0.4},
}

The keyword weight bump for factual queries alone eliminated most of the "almost right but wrong" answers.

Mistake #3: I Blew Up My Vector Database's Memory

I set ef_construction to the maximum value because "higher is better, right?" On a 50GB+ index, this meant the index build process consumed all available RAM and crashed. Twice.

The fix: Size-appropriate HNSW parameters.

# Don't max this out — your server will cry
HNSW_CONFIG = {
    "M": 16,              # connections per node (8-32 is the sweet spot)
    "ef_construction": 200,  # not 400. Not 1000. 200.
    "ef_search": 50,       # query time, not build time
}

Index build time went from "it crashed" to 45 minutes. Memory usage dropped 70%.

Mistake #4: My Embedding Model Was Too Generic

I was using a general-purpose embedding model trained on Wikipedia and web text. My documents were technical API references and engineering runbooks. The model didn't understand my domain.

The fix: Switch to a model fine-tuned for technical/code content. The difference was night and day — suddenly "migration" and "transform" weren't treated as synonyms just because they're sometimes related in general text.

Mistake #5: I Had No Query Rewrite Layer

Users typed natural questions like "why is my build slow" and the system searched for those exact words in technical documentation that said "CI pipeline optimization" and "build duration analysis." Zero overlap. Zero results.

The fix: A lightweight LLM query rewrite step before retrieval.

User query: "why is my build slow"
→ Rewritten: "CI pipeline performance optimization build duration"
→ Retrieved: Relevant documentation ✅

This single step improved recall by 40%. The cost? About 0.001 cents per query with a small model.

Mistake #6: I Didn't Filter Duplicate Context

Retrieving top-10 chunks meant I often got the same paragraph 3 times with slightly different wording. The LLM would repeat itself, hallucinate from the repetition, and produce bloated answers.

The fix: Maximal marginal relevance (MMR) re-ranking.

# Instead of returning top-10 most similar
# Return top-10 most similar AND diverse
retrieved = vector_store.search(query, k=20)
diverse = mmr_rerank(retrieved, query, lambda_param=0.7, k=10)

Answers became more concise and covered more ground.

Mistake #7: I Never Measured Retrieval Quality

I was evaluating the whole RAG pipeline end-to-end. If the final answer was bad, I didn't know if it was the retrieval, the prompt, or the LLM.

The fix: Separate retrieval evaluation.

Track hit rate: does the retrieved context contain the answer?
Track MRR (Mean Reciprocal Rank): how high in the results is the right chunk?
Build a golden test set of 100 query-document pairs
Only optimize the generation layer once retrieval scores are solid

This saved me from chasing the wrong problems for weeks.

The Results After All 7 Fixes

Metric	Before	After
Answer relevance	~45%	~85%
Avg query latency	3.2s	1.8s
Monthly vector DB cost	$180	$95
Duplicate context in responses	60%	8%

The Takeaway

RAG isn't hard because the algorithms are complex. It's hard because there are 7+ interconnected knobs, and they all interact with each other.

My advice: fix chunking first, then weights, then embedding quality. In that order. Everything else is optimization.

What's your biggest RAG headache? Drop it in the comments — I've probably hit it too.

My API Broke Every January 1st — The Timezone Bug That Slipped Past Code Review

kol kol — Sat, 20 Jun 2026 14:04:46 +0000

My API broke at exactly 00:00 UTC on January 1st. Not the users' midnight — UTC midnight. Which meant our users in Tokyo had been living with broken data since 9 AM their time.

And the worst part? The tests all passed. The staging environment worked fine. It only broke in production, because production is in a different timezone than staging.

The Bug

Here's what the code looked like:

function getDailyReport(date) {
  const start = new Date(date).toISOString().split('T')[0];
  const end = new Date(start + 'T23:59:59Z');

  return db.reports.findMany({
    where: {
      createdAt: { gte: new Date(start), lt: end }
    }
  });
}

Seems fine, right? toISOString() gives you UTC. We're filtering by date. What could go wrong?

Here's what went wrong: new Date(date) when date is just "2026-01-01" (no time component) gets interpreted in the local timezone. In staging (UTC server), "2026-01-01" → 2026-01-01T00:00:00.000Z. In production (US-East server), "2026-01-01" → 2026-01-01T05:00:00.000Z.

Five hour offset. Every single date query. For an entire year before anyone noticed.

Why Tests Passed

Our CI runs in Docker containers set to UTC. Our staging server is also UTC. Our production server? US-East. The timezone mismatch was invisible until New Year's Day rolled around and the date boundary crossed the timezone offset.

Staging (UTC):     2026-01-01 → Jan 1 00:00 UTC ✅
Production (EST):  2026-01-01 → Jan 1 05:00 UTC ❌

We lost 5 hours of data on every query. The reports showed numbers that were "close enough" that nobody flagged it for 12 months.

The Fix

function getDailyReport(date: string) {
  // Always append time to force UTC interpretation
  const start = new Date(`${date}T00:00:00Z`);
  const end = new Date(`${date}T23:59:59.999Z`);

  return db.reports.findMany({
    where: {
      createdAt: { gte: start, lt: end }
    }
  });
}

One line change. Append T00:00:00Z to force the Date constructor into UTC mode. No more ambiguity.

The Real Fix (Process, Not Code)

The code fix took 30 seconds. The real fix took a week:

Added a timezone assertion in CI — our test suite now explicitly checks that process.env.TZ === 'UTC'. If anyone changes the CI timezone, tests fail.
Set TZ=UTC in all Dockerfiles — every container, every environment, same timezone. No surprises.
Added a timezone check to our deploy script — date +%Z must return UTC before deploy proceeds.
Wrote a linter rule — flags any new Date(string) where the string doesn't contain timezone info.

The Lesson

Timezone bugs are sneaky because they don't crash. They produce wrong data that looks right. Your users won't get an error page — they'll get silently incorrect numbers, and they'll trust them.

Three rules I now follow:

Never trust the system timezone. Always set TZ=UTC explicitly.
Never parse dates without timezones. "2026-01-01" is ambiguous. "2026-01-01T00:00:00Z" is not.
Never assume your CI timezone matches production. Assert it in your tests.

I've been coding for years. I still got bit by this. If it can happen to me, it can happen to you.

Read more developer war stories and technical deep-dives at codcompass.com

My API Broke Every January 1st — The Timezone Bug I Should Have Caught in Code Review

kol kol — Fri, 19 Jun 2026 22:02:33 +0000

My API broke at exactly 00:00 UTC on January 1st. Not the users' midnight — UTC midnight. Which meant our users in Tokyo had been living with broken data since 9 AM their time.

And the worst part? The tests all passed. The staging environment worked fine. It only broke in production, because production is in a different timezone than staging.

The Bug

Here's what the code looked like:

function getDailyReport(date) {
  const start = new Date(date).toISOString().split('T')[0];
  const end = new Date(start + 'T23:59:59Z');

  return db.reports.findMany({
    where: {
      createdAt: { gte: new Date(start), lt: end }
    }
  });
}

Seems fine, right? toISOString() gives you UTC. We're filtering by date. What could go wrong?

Five hour offset. Every single date query. For an entire year before anyone noticed.

Why Tests Passed

Staging (UTC):     2026-01-01 → Jan 1 00:00 UTC ✅
Production (EST):  2026-01-01 → Jan 1 05:00 UTC ❌

We lost 5 hours of data on every query. The reports showed numbers that were "close enough" that nobody flagged it for 12 months.

The Fix

function getDailyReport(date: string) {
  // Always append time to force UTC interpretation
  const start = new Date(`${date}T00:00:00Z`);
  const end = new Date(`${date}T23:59:59.999Z`);

  return db.reports.findMany({
    where: {
      createdAt: { gte: start, lt: end }
    }
  });
}

One line change. Append T00:00:00Z to force the Date constructor into UTC mode. No more ambiguity.

The Real Fix (Process, Not Code)

The code fix took 30 seconds. The real fix took a week:

Added a timezone assertion in CI — our test suite now explicitly checks that process.env.TZ === 'UTC'. If anyone changes the CI timezone, tests fail.
Set TZ=UTC in all Dockerfiles — every container, every environment, same timezone. No surprises.
Added a timezone check to our deploy script — date +%Z must return UTC before deploy proceeds.
Wrote a linter rule — flags any new Date(string) where the string doesn't contain timezone info.

The Lesson

Timezone bugs are sneaky because they don't crash. They produce wrong data that looks right. Your users won't get an error page — they'll get silently incorrect numbers, and they'll trust them.

Three rules I now follow:

Never trust the system timezone. Always set TZ=UTC explicitly.
Never parse dates without timezones. "2026-01-01" is ambiguous. "2026-01-01T00:00:00Z" is not.
Never assume your CI timezone matches production. Assert it in your tests.

I've been coding for years. I still got bit by this. If it can happen to me, it can happen to you.

I Let AI Write My Backend Code for a Week — Here's What Actually Broke

kol kol — Sun, 14 Jun 2026 14:02:24 +0000

I told myself it would be fine. I had been using AI coding assistants for suggestions and autocomplete for months — and it worked great. So when a new project came up with a tight deadline, I thought: why not let AI handle the whole backend?

I set up a Cursor workspace, wrote a detailed spec, and hit generate. What followed was 5 days of "it compiles, but..." debugging that taught me more about software engineering than any tutorial ever did.

What Went Surprisingly Well

The boilerplate was genuinely impressive. In about 2 hours, I had:

A fully typed Express.js API with 12 endpoints
Zod validation schemas for every route
A Prisma schema with proper relations
Docker compose setup with Postgres and Redis

The code looked clean. Tests passed. I was feeling like a 10x developer.

The Cracks Started Showing

Bug #1: Silent Type Coercion

The AI generated this validation:

const userSchema = z.object({
  age: z.number(),
});

Looks fine, right? Except the API received ages as strings from the frontend. Zod parsed them fine in development (coercion worked). But in production with stricter mode? NaN everywhere. Users were getting 400 errors on signup.

Fix: z.coerce.number().int().positive() — but I had to find all 23 instances manually.

Bug #2: The N+1 Query Nobody Asked For

For a dashboard endpoint that listed users with their orders and order items, the AI generated:

const users = await prisma.user.findMany();
for (const user of users) {
  user.orders = await prisma.order.findMany({ where: { userId: user.id } });
}

Classic N+1. The Prisma docs literally have a page titled "How to avoid N+1 queries." With 500 users, this endpoint made 501 database queries and took 8 seconds.

Fix: include with nested relations — one query, 120ms.

Bug #3: Race Conditions in Token Refresh

The AI wrote a token refresh flow that looked perfect in isolation. But under load, concurrent refresh requests would invalidate each other's tokens. The AI's solution? "Add a retry mechanism." My solution? "Use a refresh token rotation pattern that handles concurrency properly."

Bug #4: The Error Handler That Swallowed Everything

catch (error) {
  console.log("Error:", error);
  res.status(500).json({ error: "Something went wrong" });
}

console.log doesn't serialize Error objects properly. Every production error was just {} in the logs. We ran like this for 3 days before anyone noticed.

Fix: console.error with proper error serialization and a proper logging library (we went with Pino).

The Real Problem

Here's what I learned: AI generates code that's correct in isolation but fragile in context.

It doesn't know:

Your deployment architecture (so it misses N+1 queries)
Your traffic patterns (so it ignores race conditions)
Your logging infrastructure (so it uses the wrong logger)
Your team's conventions (so it mixes patterns)

The generated code passes tests because tests are narrow. It compiles because the syntax is valid. But production is where context matters.

What I Changed

AI writes the first draft, humans write the final version. I'm not going back to writing everything from scratch, but every PR now requires a manual review of control flow, error handling, and data access patterns.
Architecture decisions stay human. Schema design, caching strategy, and error handling patterns are too context-dependent to outsource.
Add integration tests that AI can't fake. Unit tests pass. Integration tests reveal the gaps. We added a test suite that runs the full API against a real Postgres instance.
Observability from day one. Structured logging, request tracing, and error tracking are now part of the project template, not an afterthought.

The Bottom Line

AI didn't break my project. My assumption that "generated code equals production-ready code" did.

AI is an incredible force multiplier when used as a pair programmer. It's a liability when treated as a replacement for engineering judgment.

The week cost me 3 extra days of debugging, but I shipped a more robust system than I would have built alone — because the AI's mistakes taught me where my own blind spots were.

Use AI. But keep your hands on the wheel.

Have you had similar experiences with AI-generated code? I'd love to hear your war stories in the comments.

Our Test Suite Passed 100% — Then Users Found 14 Bugs in One Day

kol kol — Tue, 09 Jun 2026 18:03:24 +0000

We had 847 tests. Green checkmarks across the board. 100% coverage on our critical paths. I was proud of that dashboard.

Then a user reported that our checkout was double-charging on Safari. Another said the password reset emails weren't arriving. Within 24 hours we had 14 confirmed bugs — and our CI pipeline was still proudly green.

That's when I realized: 100% code coverage is a vanity metric that makes you feel safe while your users burn.

The Illusion of Coverage

Here's what our test suite was great at:

Testing individual functions in isolation
Verifying happy paths with clean inputs
Catching regressions in pure utility functions

Here's what it completely missed:

Browser-specific behavior — Safari's date parsing is different from Chrome's. Our test runner used Node.js. No browser, no Safari.
Race conditions — Two API calls firing simultaneously? Our mocked fetch resolved instantly. In production, timing matters.
Integration gaps — Each module had tests. The connections between modules did not.
Real-world data — Our fixtures were clean. User data is never clean.

The Bug That Started It All

A user in Japan reported being charged twice for a single purchase. We couldn't reproduce it locally. Our payment integration tests passed every time.

The root cause: a double-submit button on slow networks. Our mock API responded in 12ms. Real networks: 800ms. That gap was enough for impatient fingers to click twice.

The fix was 3 lines of code:

const [isSubmitting, setIsSubmitting] = useState(false);
// Button: disabled={isSubmitting}

Three lines. But the test suite — our beautiful 847-test suite — had zero tests for this scenario because nobody wrote a test for "user clicks button twice."

The 14-Bug Autopsy

After that incident, we categorized all 14 bugs:

Bug Category	Count	Tests Should've Caught It
Browser compatibility	4	❌ No cross-browser tests
Race conditions	3	❌ Mocks too fast
Edge-case user input	3	❌ Fixtures too clean
Third-party API changes	2	❌ No contract testing
Time zone bugs	2	❌ All tests ran in UTC

14 bugs. Zero caught by CI. The problem wasn't that we didn't have enough tests — we had the wrong kind of tests.

What We Changed

1. Added Integration Tests at Module Boundaries

Unit tests check the bricks. Integration tests check the mortar. We added tests specifically for the connections between services — where most real bugs hide.

2. Started Running Tests in Real Browsers

We added Playwright for critical user flows: checkout, auth, search. These run against a real Chrome and Firefox instance. Safari is next.

3. Mock Network Latency

Instead of instant mock responses, we randomized delays between 100ms and 2000ms. This surfaced race conditions we never knew existed.

4. Contract Testing for APIs

We used Pact to verify that our frontend's expectations of backend APIs actually match reality. Two bugs disappeared the day we added this.

5. Time Zone Roulette

We randomize the test runner's timezone. Half our date bugs appeared within the first week.

The New Philosophy

Coverage tells you what code runs. It doesn't tell you what breaks.

Now we track different metrics:

Bug escape rate — bugs found by users vs. caught in CI
Mean time to detection — how fast our tests find regressions
Integration test coverage — not line coverage, but scenario coverage

Our total test count went down (we deleted 200+ redundant unit tests). Our bug escape rate went down 80%.

The dashboard looks less impressive. The product works better.

Have you been burned by "green tests, broken production"? What testing gaps surprised you most? I'd love to hear your war stories in the comments.

I Added 20 Indexes to "Fix" Slow Queries — My Database Got 3x Slower

kol kol — Mon, 08 Jun 2026 14:02:21 +0000

I Added 20 Indexes to "Fix" Slow Queries — My Database Got 3x Slower

Six months ago, I inherited a PostgreSQL database that was choking on production traffic. API response times hit 8 seconds. Users were timing out. The ops team was getting paged at 2 AM.

So I did what any "experienced" developer would do: I added indexes. Lots of them.

Twenty indexes across twelve tables. Problem solved, right?

Wrong. The database got slower. Write operations crawled. Disk usage spiked. And the queries I was trying to optimize? They were still slow.

Here's what I learned the hard way about index tuning — and the process I use now that actually works.

The Mistake Everyone Makes

The biggest misconception about indexes is this: more indexes = faster queries.

PostgreSQL has to maintain every index on every write. Add an index, and every INSERT, UPDATE, and DELETE gets heavier. With 20 extra indexes, our write-heavy analytics table was spending more time updating indexes than storing data.

But the real killer was something I didn't expect: index bloat.

What Actually Went Wrong

1. I Indexed Low-Cardinality Columns

I put an index on a status column with only 4 possible values: pending, active, suspended, deleted.

PostgreSQL's query planner looked at that index, saw that each value matched ~25% of rows, and decided a full table scan was cheaper. The index was dead weight — costing disk space and write performance, providing zero read benefit.

2. I Created Redundant Indexes

I had:

CREATE INDEX idx_user_email ON users(email)
CREATE INDEX idx_user_email_name ON users(email, name)

The second index already covers queries on email alone. The first one was pure redundancy. PostgreSQL was maintaining two indexes for essentially the same lookup.

3. I Ignored Partial Indexes

Our orders table had millions of rows, but 90% were completed or cancelled. The slow queries were all looking for status = 'pending'. A partial index like:

CREATE INDEX idx_orders_pending 
ON orders(created_at, customer_id) 
WHERE status = 'pending';

This tiny index (10% of the table) outperformed my full-table indexes by 5x.

The Fix: A Methodical Index Audit

Here's the process I followed to undo the damage and actually optimize:

Step 1: Find Unused Indexes

SELECT 
  schemaname,
  tablename,
  indexname,
  idx_scan,
  pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND NOT indisunique
ORDER BY pg_relation_size(indexrelid) DESC;

This revealed 14 indexes that had never been used since the last stats reset. I dropped them immediately.

Step 2: Find the Real Slow Queries

Instead of guessing, I used pg_stat_statements:

SELECT 
  query,
  calls,
  mean_exec_time,
  total_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

This showed me which queries were actually burning CPU time. Not the ones I assumed were slow — the ones that actually were.

Step 3: Use EXPLAIN ANALYZE

For every slow query, I ran EXPLAIN ANALYZE to see the actual execution plan. Not EXPLAIN — EXPLAIN ANALYZE. The difference is that EXPLAIN ANALYZE actually runs the query and shows real timing data.

What I found: PostgreSQL was doing sequential scans on tables where I had indexes, because my query conditions didn't match the index column order.

Step 4: Build Right-Sized Indexes

The three indexes that actually made a difference:

-- Composite index matching the actual WHERE + ORDER BY pattern
CREATE INDEX idx_analytics_date_type 
ON analytics(event_date, event_type) 
WHERE event_date > '2026-01-01';

-- Covering index that includes all needed columns (no table lookup)
CREATE INDEX idx_users_lookup 
ON users(email) 
INCLUDE (name, created_at);

-- Expression index for a common pattern
CREATE INDEX idx_orders_lower_email 
ON orders(LOWER(customer_email));

The Results

Metric	Before Audit	After Audit
Total indexes	47	18
Avg query time	3.2s	0.4s
Write latency	180ms	25ms
Index disk usage	12.4 GB	2.1 GB
Index cache hit rate	67%	94%

The Rule I Follow Now

Never add an index without running EXPLAIN ANALYZE first.

Every index should have a specific query it's designed to accelerate. If you can't point to the query and show the before/after execution plan, don't create the index.

Indexes are not a "just in case" thing. They're a surgical tool. Use them like one.

Have you ever made your database slower by trying to optimize it? What was your wake-up call?

I Thought My API Was Rate-Limited — Until Someone Scraped 2 Million Requests in 4 Hours

kol kol — Sun, 07 Jun 2026 14:04:53 +0000

I had express-rate-limit installed. I had it configured. I had tests that proved it worked.

And yet, someone still scraped 2 million API requests from my production server in under 4 hours. Costing me $4,200 in upstream API calls.

Here's exactly what went wrong, how I found out, and the architecture I use now.

The Setup That Lied to Me

My API was a simple Express app. I added rate limiting like any reasonable developer would:

import rateLimit from 'express-rate-limit';

const limiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 100,                  // 100 requests per window
  standardHeaders: true,
  legacyHeaders: false,
});

app.use('/api/', limiter);

Tests passed. I saw X-RateLimit-Limit: 100 in curl responses. I slept well.

The problem? I was running 4 instances behind a load balancer. Each instance had its own in-memory counter. So the real limit was 400 requests per 15 minutes — not 100.

And the attacker wasn't hitting one IP with 100 requests. They were rotating through a proxy pool of 2,000+ IPs.

How It Happened

At 2:47 AM, our monitoring dashboard showed something odd: API request volume spiked 800%. I dismissed it as a newsletter push going out.

By 4:00 AM, the database connection pool was saturated. Queries that normally took 12ms were timing out at 30 seconds.

By 6:30 AM, I checked our upstream LLM provider bill. We'd made 2.1 million API calls since midnight. At $0.002 per call, that's roughly $4,200.

The attacker was:

Hitting our search endpoint with systematic keyword variations
Rotating IPs from a residential proxy network
Staying under per-instance rate limits by spreading requests across IPs
Extracting structured data from our responses

Why My Defenses Failed

Defense	Why It Failed
`express-rate-limit` (in-memory)	Not shared across instances
IP-based limiting	Proxy rotation defeated it
No request logging depth	Couldn't trace the attack pattern
No anomaly alerts	800% spike looked like "normal traffic"

The fundamental mistake: I treated rate limiting as a configuration problem instead of an architecture problem.

The Fix: Distributed Rate Limiting

I rebuilt the system with three layers:

Layer 1: Redis Sliding Window (The Real Rate Limiter)

import Redis from 'ioredis';
import { createClient } from 'redis-rate-limiter';

const redis = new Redis(process.env.REDIS_URL);

async function checkRateLimit(key, max, windowSec) {
  const now = Date.now();
  const windowStart = now - windowSec * 1000;

  // Use Redis sorted set for true sliding window
  await redis.zremrangebyscore(key, 0, windowStart);
  const count = await redis.zcard(key);

  if (count >= max) {
    return { allowed: false, remaining: 0 };
  }

  await redis.zadd(key, now, `${now}-${Math.random()}`);
  await redis.expire(key, windowSec);

  return { allowed: true, remaining: max - count - 1 };
}

This gives you a true 100-request limit across all instances, not 100 per instance.

Layer 2: Behavioral Fingerprinting

IP addresses are useless against proxy pools. Instead, I track:

Request pattern entropy — Are endpoints being hit in alphabetical order? That's a scraper.
Timing regularity — Requests every exactly 1.0 seconds? Bot.
Header consistency — Same User-Agent, same Accept-Encoding, same everything? Bot.

function calculateRequestEntropy(requests) {
  const endpoints = requests.map(r => r.path);
  const uniqueEndpoints = new Set(endpoints).size;
  // Low entropy = sequential/scraping pattern
  return uniqueEndpoints / endpoints.length;
}

// Entropy < 0.3 → likely scraping
// Entropy > 0.7 → likely human

Layer 3: Cost-Based Circuit Breakers

This is the one that actually saves money:

// Track estimated cost per endpoint
const endpointCosts = {
  '/api/search': 0.002,    // LLM call
  '/api/analyze': 0.015,   // Expensive LLM call
  '/api/health': 0,        // Cheap
};

let hourlyCost = 0;
const COST_THRESHOLD = 50; // Alert at $50/hr

function trackCost(endpoint) {
  hourlyCost += endpointCosts[endpoint] || 0;
  if (hourlyCost > COST_THRESHOLD) {
    // Auto-throttle expensive endpoints
    expensiveEndpoints.enabled = false;
    slack.alert(`API cost spike: $${hourlyCost.toFixed(2)}/hr`);
  }
}

When costs spike, expensive endpoints automatically throttle. You don't need to be awake at 3 AM to stop a bleeding wallet.

The Results After 30 Days

Metric	Before	After
Successful scrapes	2 incidents	0
Peak API cost/hr	$4,200	$12
False positive blocks	0	2 (tuned rules)
Legitimate user impact	N/A	None detected

The Real Lesson

Rate limiting isn't about setting a number. It's about understanding:

Your threat model — Who would want to scrape your API and why?
Your architecture — In-memory doesn't work in a distributed system. Period.
Your cost exposure — Know the dollar cost per endpoint, and set automatic circuit breakers.

The $4,200 mistake taught me that security theater — rate limiting that looks right but isn't — is worse than no rate limiting at all. It gives you confidence to deploy things that aren't actually protected.

Have you ever been bitten by a "working" defense that wasn't? What's your rate limiting setup? Drop it in the comments — I'm always looking for ways to improve mine.