DEV Community

Cover image for 👻 Do you have any horror stories to share? Spooky bugs, scary data leaks, horrifying code, etc. 🎃
Ben Halpern
Ben Halpern

Posted on

👻 Do you have any horror stories to share? Spooky bugs, scary data leaks, horrifying code, etc. 🎃

Latest comments (49)

Collapse
 
hexhead profile image
Bill White

Once spent an entire Christmas break 2003 tracking down a bug that turned out to be one character in a printf format specifier.

Collapse
 
leandrogs profile image
Leandro Gomes

We have a async job on resque to expire passwords given an interval. No one have notice that this job stoped to work for a long time. It had about 8M jobs enqueued to process. Someone fixed the queue and send all jobs to be executed. We expired password for about 40% of our clients and some of them received more than a thousand password expiration emails in one day. Shit happens...

Collapse
 
benhemphill profile image
Ben Hemphill

Hadoop 0.20.2 had a bug where fixing missing block replicas would not respect the rack aware placement policy. Over the course of many years we had lost enough drives to start losing blocks whenever we lost a drive. Took us a while to figure out what was happening. Luckily the data in hadoop was not the primary source, but we had to recopy data from origin. Then copy all files in HDFS to a new HDFS cluster (almost every file had at least one block affected) Petabyte scale copies don't happen quick. :)

Collapse
 
danieljsummers profile image
Daniel J. Summers

This was probably 15 years or so ago at this point (and deals with mainframes and COBOL; yes, I know I'm like a living "You're not connected" icon - and get off my lawn while you're at it).

On the UNISYS mainframe, they have a transaction process side that's way more performant than the traditional model (think nginx vs. Apache). But, to really get the throughput for commonly-used programs, you use a concept called RTPS (Resident Transaction Processing System) - basically, instead of your program hitting STOP RUN and terminating, you GO TO a point near the top that reinitializes all your variables, then waits for the next input. The advantage to RTPS is that the operating system doesn't have to actually load the executable from disk; since it's in memory, it just runs it.

Anyway, our current setup didn't need this. But, our "next" setup (my project) needed to clear 100 transactions a second, 100k per day. Loading what was (in effect) the security program 100x per second was crazy I/O; when most of these programs finished, they called a second to display their output, and that's doubling the I/O. So, the security program, the screen-based output program, and the paginated plain-text output programs are prime candidates for RTPS.

As part of a contract with UNISYS to make sure this all went smoothly, my employer actually sent me to work with them, and we established that we could make the security program run as part of RTPS, and it worked - it was really fast! I returned to my office, excited to put this code change on our development box so that everyone could start exercising it; this turned these programs into what we call today "long running processes," and I knew that exercise was crucial to getting a lot of the kinks worked out. When I got ready to activate them, I was really excited; I think my fingers were shaking as I typed the command to launch 5 copies of this program into RTPS. It worked! I listed them, and it showed 5 copies; I ran a transaction, and that worked too! Then, the dreaded

SESSION PATH CLOSED

arrived in my terminal. "Great - what an awesome time for the network to suck!" We gave it a few minutes (I had a small audience at this point), and I was able to get signed back in. I ran the command to put 5 copies of that program back in RTPS, and again, we lost connection within a few seconds.

Time to call the help desk. We dialed the number, and this is literally how they answered the phone...

"WHAT are you DOING?!?!"

Long story sh... er, not quite as long, there were two patches the mainframe needed that they had never bothered to load, because "no one uses that." We chastised them for not keeping us current on patches, and they obtained and loaded them. RTPS worked great after that. This kept the ghouls away, until the same organization provisioned us 25% of what we and UNISYS told them we'd need when we actually went enterprise-wide with this project...

(On the upside, I joined a rather elusive group of programmers who made the mainframe spontaneously reboot from something other than the reboot command (which they wisely never gave any of us) - an elite group known as "real programmers".)

Collapse
 
preciselyalyss profile image
Alyss 💜

I once not only deployed on a Friday...I changed the hosting server and underlying software AND THEN deployed on a Friday. 👻

Collapse
 
danieljsummers profile image
Daniel J. Summers

Did it work out, or did you learn why the conventional wisdom about that can be summed up with "don't"?

Collapse
 
preciselyalyss profile image
Alyss 💜

It did work out, but I was enabled and supported by the manager/director from technical operations. I'm not ignorant of the conventional wisdom and I'm stability-conscious. There are nuances to the situation which I didn't feel the need to divulge in a light-hearted post.

Thread Thread
 
danieljsummers profile image
Daniel J. Summers

I wasn't trying to imply otherwise. :) I just wondered if it all worked out; I'm glad it did.

Collapse
 
atldev profile image
Chris

One DB, 4500 stored procedures.

Collapse
 
rendall profile image
rendall • Edited

Not scary, but spooky good.

I once saw a self-contained function written in 10+ year-old Transact-SQL in a banking database, used to calculate if a date was Easter (literally IS_EASTER(DATE)...).

This was High Technomancy.

Per Wikipedia, Easter "falls on the Sunday following the full moon that follows the northern spring equinox". So, you can imagine the loops and mods and leap year and off-leap year calculating.

And, since the method of calculation depends on whether you're using the Gregorian or Julian calendar, of course there was a conditional "IF YEAR < 1752..." enclosing a whole other set of calculations.

Collapse
 
jasodonnell profile image
James O'Donnell

In a previous life, I was tasked with trying to scrape an automated migration task together on the DB. It was a fairly organic operation that required me to work in production. There was a lot of back and forth between my target and production and I had to drop the target DB frequently.

The hours got long, coffee ran out and just as I had wrapped up the task, I decided to clean up for one last test run. I started by deleting the target db. But it wasn't the target. It was production.

Collapse
 
jtvanwage profile image
John Van Wagenen

That one time when the test credit card server actually processed payments... Turns out you had to use the test credit card on the test payment server or it'd still try to charge the card. Imagine that. Luckily we caught it in time before the processing went through.

And that other time when some small bug (I can't remember what it was anymore) prevented (a small subset of) payments from being processed for a day or so.

Those are probably my two biggest blunders.

As far as scary code goes, I was asked to rewrite some modules that were written years ago by a third party out of the country. Some of the things I saw in there really made me scratch my head and literally facepalm at times.

Collapse
 
oscherler profile image
Olivier “Ölbaum” Scherler

In the third week at my new job, we had to import a multi-gigabyte database into MySQL on our development server, and the /var partition was too small to hold it. “Fortunately,” the machine was setup with LLVM, so we could resize it (and /home, to make room). To avoid making silly mistakes with the CLI, we chose to use the GUI tool. After we downsized /home and upsized /var, everything was corrupt. The GUI had resized the partitions, but not the volumes, so /var was overlapping /home.

(Here you have to wonder about the point of having a GUI if it makes even sillier mistakes than those you’d have made in the CLI.)

Fortunately, we could retrieve the exact previous block counts of both partitions from the logs, and resize everything back to how it was before. But wait, it gets better: this time, the GUI chose to resize partition AND volume, so everything was still corrupt, /home was now smaller than the data it contained. So we had to re-re-resize the volumes, and fortunately everything was back to normal and no interesting data was lost.

Did I mention it was my third week on the job that I almost nuked the team’s sole development machine?

From this day on, every time Linux suggests I partition and set up LLVM, I smash the Nope button.

Collapse
 
bizzibody profile image
Ian bradbury • Edited

That time I spent a week deconstructing and documenting a super complex algorythm in c only to find the last line.... return 0.5.

Still angry.

Collapse
 
maxwell_dev profile image
Max Antonucci

Understandable. I'm surprised your computer escaped intact after something like that.

Collapse
 
plutov profile image
Alex Pliutau

For around one month I had a bug on Mac when Mac clicked random parts of the screen sometimes, like switch to another tab in Slack, open a tab in browser, etc. Then I found that it was my coworker who used bluetooth mouse, which was connected to my laptop, I used it one time, so it was saved in configuration.

Collapse
 
nempet profile image
Nemanja Petrovic

Once I created bug in our system and the whole day Invoices on the website were not creating at all. It was d-day...

Collapse
 
arcticshadow profile image
Cole Diffin

Many years back I was involved in the double charging in ~100 peoples stored credit card details, during a routine scheduled payment process. Turns out someone had inverted the logic on an if clause. Spent well into a Sunday night trying to track that down.

Collapse
 
maxwell_dev profile image
Max Antonucci • Edited

Also, here's a picture I share with my front-end developer friends to scare the crap out of them.

CSS selector executive order

Collapse
 
maestromac profile image
Mac Siri

This is rather terrifying