DEV Community

Cover image for 👻 Do you have any horror stories to share? Spooky bugs, scary data leaks, horrifying code, etc. 🎃
Ben Halpern
Ben Halpern Subscriber

Posted on

👻 Do you have any horror stories to share? Spooky bugs, scary data leaks, horrifying code, etc. 🎃

Latest comments (46)

Collapse
 
hexhead profile image
Bill White

Once spent an entire Christmas break 2003 tracking down a bug that turned out to be one character in a printf format specifier.

Collapse
 
leandrogs profile image
Leandro Gomes

We have a async job on resque to expire passwords given an interval. No one have notice that this job stoped to work for a long time. It had about 8M jobs enqueued to process. Someone fixed the queue and send all jobs to be executed. We expired password for about 40% of our clients and some of them received more than a thousand password expiration emails in one day. Shit happens...

Collapse
 
bgadrian profile image
Adrian B.G.
Collapse
 
benhemphill profile image
Ben Hemphill

Hadoop 0.20.2 had a bug where fixing missing block replicas would not respect the rack aware placement policy. Over the course of many years we had lost enough drives to start losing blocks whenever we lost a drive. Took us a while to figure out what was happening. Luckily the data in hadoop was not the primary source, but we had to recopy data from origin. Then copy all files in HDFS to a new HDFS cluster (almost every file had at least one block affected) Petabyte scale copies don't happen quick. :)

Collapse
 
danieljsummers profile image
Daniel J. Summers

This was probably 15 years or so ago at this point (and deals with mainframes and COBOL; yes, I know I'm like a living "You're not connected" icon - and get off my lawn while you're at it).

On the UNISYS mainframe, they have a transaction process side that's way more performant than the traditional model (think nginx vs. Apache). But, to really get the throughput for commonly-used programs, you use a concept called RTPS (Resident Transaction Processing System) - basically, instead of your program hitting STOP RUN and terminating, you GO TO a point near the top that reinitializes all your variables, then waits for the next input. The advantage to RTPS is that the operating system doesn't have to actually load the executable from disk; since it's in memory, it just runs it.

Anyway, our current setup didn't need this. But, our "next" setup (my project) needed to clear 100 transactions a second, 100k per day. Loading what was (in effect) the security program 100x per second was crazy I/O; when most of these programs finished, they called a second to display their output, and that's doubling the I/O. So, the security program, the screen-based output program, and the paginated plain-text output programs are prime candidates for RTPS.

As part of a contract with UNISYS to make sure this all went smoothly, my employer actually sent me to work with them, and we established that we could make the security program run as part of RTPS, and it worked - it was really fast! I returned to my office, excited to put this code change on our development box so that everyone could start exercising it; this turned these programs into what we call today "long running processes," and I knew that exercise was crucial to getting a lot of the kinks worked out. When I got ready to activate them, I was really excited; I think my fingers were shaking as I typed the command to launch 5 copies of this program into RTPS. It worked! I listed them, and it showed 5 copies; I ran a transaction, and that worked too! Then, the dreaded

SESSION PATH CLOSED

arrived in my terminal. "Great - what an awesome time for the network to suck!" We gave it a few minutes (I had a small audience at this point), and I was able to get signed back in. I ran the command to put 5 copies of that program back in RTPS, and again, we lost connection within a few seconds.

Time to call the help desk. We dialed the number, and this is literally how they answered the phone...

"WHAT are you DOING?!?!"

Long story sh... er, not quite as long, there were two patches the mainframe needed that they had never bothered to load, because "no one uses that." We chastised them for not keeping us current on patches, and they obtained and loaded them. RTPS worked great after that. This kept the ghouls away, until the same organization provisioned us 25% of what we and UNISYS told them we'd need when we actually went enterprise-wide with this project...

(On the upside, I joined a rather elusive group of programmers who made the mainframe spontaneously reboot from something other than the reboot command (which they wisely never gave any of us) - an elite group known as "real programmers".)

Collapse
 
preciselyalyss profile image
Alyss 💜

I once not only deployed on a Friday...I changed the hosting server and underlying software AND THEN deployed on a Friday. 👻

Collapse
 
danieljsummers profile image
Daniel J. Summers

Did it work out, or did you learn why the conventional wisdom about that can be summed up with "don't"?

Collapse
 
preciselyalyss profile image
Alyss 💜

It did work out, but I was enabled and supported by the manager/director from technical operations. I'm not ignorant of the conventional wisdom and I'm stability-conscious. There are nuances to the situation which I didn't feel the need to divulge in a light-hearted post.

Thread Thread
 
danieljsummers profile image
Daniel J. Summers

I wasn't trying to imply otherwise. :) I just wondered if it all worked out; I'm glad it did.

Collapse
 
atldev profile image
Chris

One DB, 4500 stored procedures.

Collapse
 
s_anastasov profile image
Stojan Anastasov • Edited

I am not sure what that does... I am writing the Android client (using Kotlin), the server code is written in PHP :)

Collapse
 
rendall profile image
rendall • Edited

Not scary, but spooky good.

I once saw a self-contained function written in 10+ year-old Transact-SQL in a banking database, used to calculate if a date was Easter (literally IS_EASTER(DATE)...).

This was High Technomancy.

Per Wikipedia, Easter "falls on the Sunday following the full moon that follows the northern spring equinox". So, you can imagine the loops and mods and leap year and off-leap year calculating.

And, since the method of calculation depends on whether you're using the Gregorian or Julian calendar, of course there was a conditional "IF YEAR < 1752..." enclosing a whole other set of calculations.

Collapse
 
jasodonnell profile image
James O'Donnell

In a previous life, I was tasked with trying to scrape an automated migration task together on the DB. It was a fairly organic operation that required me to work in production. There was a lot of back and forth between my target and production and I had to drop the target DB frequently.

The hours got long, coffee ran out and just as I had wrapped up the task, I decided to clean up for one last test run. I started by deleting the target db. But it wasn't the target. It was production.