DEV Community

What was the worst bug you've ever written?

Ben Halpern on January 12, 2018

So I just wrote my worst bug ever. // Detect dark theme var iframe = document.getElementById('tweet-951949016245395457-957'); if (documen...

Read full post

David Muckle • Jan 12 '18

Already told this story as a tweet responding to the above, but longer stories are always fun.

I set up a RPI Zero with a webcam to monitor my front door for potential visiters. I wanted to have it text me any time someone was at the door. Originally I tried to use the text-email gateway for my provider, but sending emails from the RPI just ended up looking like spam, and I didn't want to bother setting up some email relay, so I looked into using Twilio. Thankfully they have a nice little script you can use to text a given number, you just have to supply your API key and get a Twilio number to text from. The software I was using, motion, also has a bunch of nifty hooks, including one for when it detects an "event" of motion, the definition of which you can tune. Awesome! So, plug in the script to that, restart motion, and start waving my hand in front of the camera!

I then proceeded to get about 500 texts in the span of about a minute.

Eventually, after shutting everything down, I figured out what I did wrong. I had put the script into the on_motion hook, instead of the on_motion_event hook. The former hook is if you want to trigger something any time a frame of motion is detected. Wave your hand in front of that a bunch, and you're bound to trigger that hook many times. So not only did I manage to flood my phone with texts, but I also burnt through a good amount of Twilio trial credits.

Doug Stull • Jan 13 '18 • Edited

I once was testing sudoers, and happened to name the account f*ckyou@companydomain.com. Well I made a mistake in the sudoers setup and caused it to send the traditional: f*ckyou@companydomain.com is not authorized to the support email address that had about 60 ppl on it ranging from business to unix admins...

Ben Halpern • Jan 13 '18

This is pretty great

Rob Waller

@robdwaller

@ThePracticalDev @heroku That's nothing, I once wrote an email script to send 100,000 emails out. It sent one email to the first email, two to the second, three to the third, four to the fourth... I managed to turn the server off at about 1,000...

22:56 PM - 12 Jan 2018

1290

Pietro Bongiovanni • Jan 18 '18

Had a similar one. I had just rewritten a bunch of cron jobs that send time sensitive emails.

There was one cron job that was supposed to run every minute and check the status of a bunch of records that where created 20 minutes before.

When I was testing this little script I changed the time span where the record needed to be created from 1 minute to something like three days and I forgot to change it back before deploying.

When we deployed the change everything went fine because the script wasn't supposed to run till the following morning at around 9am.

Well the morning after I got to the office and everybody was going around crazy because some customers had already received ~30 mails each. I ran to my computer and started killing the container that was sending the emails before going through the source code and finding where the problem was.

I sent around 10k emails in 10 minutes.

Never have I ever wrote a cron job without a unit test after that.

Shalvah • Mar 21 '18

Wait...cron jobs have unit tests?

sehe • Jul 4 '18

Yours don't?

Jake Casto • Jan 13 '18

I was messing with a (dumb) co worker. I told him that ‘rm -rF /‘ would fix all issues in his code. As a joke I added it at to the top of the loader for our web app... I accidentally pushed it to a production server. Long story short I lost my job and had to rewrite everything.

Jason C. McDonald • Jan 13 '18

Yikes! This is exactly why we tell people not to make light of death commands. :P You make that mistake once, and never again!

Max Cerrina • Jan 13 '18

Such a benign version, but I still remember how super upset I was as a kid and someone told me alt+f4 would refresh the page or something.

𝚓𝚘𝚕𝚕𝚢 | 𝚠𝚒𝚕𝚌𝚘 • Jan 13 '18

"there's this really cool Gameboy cheat code for Tetris, if you press 'Select, Start, B, and A' all at the same time" is a really cool/cruel joke kids of the early 90s would tell each other ...

𝚓𝚘𝚕𝚕𝚢 | 𝚠𝚒𝚕𝚌𝚘 • Jan 13 '18

more related to your benign example (and even more benign): when i was first really tinkering with Linux back in like 2001/2002, i couldn't figure out how to get out of this program i was using (you'd think it would be Vi, but it was probably just a Man page) and a co-worker suggested Alt-F4-- while that did technically get me unstuck from my issue, it took me another day or so to figure out what i'd actually done

Jason C. McDonald • Jan 13 '18

Let me guess - erases scores?

𝚓𝚘𝚕𝚕𝚢 | 𝚠𝚒𝚕𝚌𝚘 • Jan 14 '18

it reboots the device!

Ben Sinclair • Apr 16 '18

On IRC it'd be "use /quit bugging me " to ignore someone.

Paula • Jan 13 '18

Damn, this is hardcore

Andy Zhao (he/him) • Jan 13 '18

Oh man... My heart really sunk for you...

Jake Casto • Jan 13 '18

It did fix all code issues though :)

Ben Halpern • Jan 13 '18

Are you being serious? That is remarkable.

Jake Casto • Jan 13 '18

I’m dead serious.

𝚓𝚘𝚕𝚕𝚢 | 𝚠𝚒𝚕𝚌𝚘 • Jan 13 '18

that's the Chicago Fire method of fixing things-- burn it all down, and start over

Pietro Bongiovanni • Jan 18 '18

I once ran rm -rf / in a virtual machine just for fun to see what it would happen.

The VM had some shared disks with my actual OS and it started deleting everything that I had on my desktop and some other important folders.

I was long due for a backup so I ended up losing a crap ton of data, but now I will never forget to check for shared disks when i'm working in a VM.

Michael Peyper • Jan 13 '18

I think you win this round or "who's made the biggest mistake"

Phil Ashby • Jan 15 '18

To your credit you identified a number of workflow issues doing that:

peer review anyone?
the code only /exists/ on production (where's the source control)?
firing you /before/ asking you to re-write seems pretty short sighted..

I hope you've got a job somewhere more thoughtful now :)

Jake Casto • Jan 18 '18

Thank you for trying to credit me, a few issues there haha. I was the primary reviewer, I had worked on everything so long that no one really bothered reviewing my code. We had two git repos, one development & one production. I pushed the commit to the production repo instead of the development one haha.

Dan Fellini • Apr 21 '18

Epic. I love it.

Jason C. McDonald • Jan 13 '18

Many years ago, I tried to use some deeply nested-for loops in ActionScript 3.0 code I wrote to find the right combination of values in an XML template to match the hash of a given file. (The goal was to determine what exactly had changed in the file.)

About 20 minutes into waiting for it to finish, I decided to figure out how long it was going to take.

I determined mathematically it would finish on June 18, 2598 at 8:42 AM.

I learned two things that day:

1) One-way hashes are called that for a reason.

2) Deeply nested for loops are the work of the Devil anyway.

Ben Halpern • Jan 13 '18

Well how's the script coming along now? Any celebrations planned for 2598?

Jason C. McDonald • Jan 13 '18

I don't have that much patience, so I just aborted it instead. :P

Bruno Louzada • Jan 15 '18

I discovered Insert with multiple rows, tried insert 250k lines each line a insert command in a remote database it will take 5 days hahaha. With one insert and 250k rows it took < 5 mins.

Dian Fay • Jan 13 '18

I stopped the line at a major North American steel producer for the better part of a day.

marcellothearcane • Nov 22 '18

But how?

Dian Fay • Nov 22 '18

It's been over ten years and I don't remember the exact details. It probably had something to do with the fact that I was very much still learning basic SQL (at a co-op/internship kind of thing) and the company didn't have any kind of process controls for pushing to production until very shortly afterward.

Anthony Orona • Feb 22 '20

Haha that reminds me of my mess-up at my internship. I was working on my first big feature and iteratively pushing code into the master branch as I learned. My code was tagged for release and deployed to production with bugs like links not working properly etc. Not that big of a deal but really embarrassing for me.

Vaidehi Joshi • Jan 13 '18

I wrote a database migration to add a column and was as happy as a clam! I deployed it to production and thought everything would be fine.

Five minutes later I realize that my migration also involved updating every single user record and the database is locked and no one can access prod and OOPS 🙈

Vaidehi Joshi • Jan 13 '18

PS. You all did a great job handling the downtime ❤️

Ben Halpern • Jan 13 '18

Shalvah • Mar 21 '18

Yasss! I did this too. 😂😂

Evan Plaice • Jan 13 '18

Network communication bug on a custom ping-pong handshake protocol caused by stale state in an array overflow.

For a little perspective. I was writing a graphical multi-touchscreen front-end for a 80's era commercial flight simulator. The sim host ran a early 80s era realtime AIX environment. Booting the damn thing took about 15 minutes and involved following a 'bootstrap loader' procedure whereby you'd have to fat-finger the boot instructions into a keypad in hexadecimal. Ethernet networking didn't exist when it was built so we had to have an ethernet card custom built. On the software end we were working with an AIX specialist from the UK who designed the protocol, implemented a bare bones ethernet driver, and provided a client/server implementation on the host end.

Our AIX guy had done this before on other similar machines but it was my first journey into networking programming. I wouldn't have physical access to the host until integration so the AIX guy gave me a mini client/server simulator app written in C that I could train my client/server implementation against.

My end was entirely written in C#, where trying to do layer-2 networking is difficult enough. Either way, I found a PCAP wrapper (ie SharpPcap)to hack together a networking protocol, translated a floating point format converter (Ie the host didn't use IEEE 754) from C, reverse engineered a raw dump of the symbol table and loaded it into a DB, etc...

Integration finally came and I was a nervous wreck. We had 7 full days of downtime to make the changes, including ripping out the old hardware (ie ray-tracing displays + 100+ buttons) and replacing it with the new (3 touch screens). Flight simulators are the backbone of an airline, pilots who can't keep up with the FAA requirements for required training are grounded until they can. Therefore, flight simulators typically operate 20 hours/day, 7 days/week. Downtime beyond our scheduled window was not an option.

Months of preparation and death-march sprinting was finally yielding results. Everything was working brilliantly to the point where we could even start identifying opportunities to make adjustments and performance improvements.

That was, until somebody noticed something strange happening on the interface. Instead of updating to show the current state of the host as expected, a small select number of labels were constantly toggling between the correct value and something else. The symptom only occurred on a few pages, and only when the pages were loaded in a specific order.

I racked my brain for hours chugging coffee and pouring over every detail of my networking code until at about 4AM. That's when I finally managed to catch the AIX specialist taking a break from his VT220 terminal. I picked his brain, going back over the the specifics of the networking protocol spec. When that failed yield results, I started picking his brain about how he implemented the protocol on the host. That's when I had a sudden 'lightbulb' moment.

It turns out that, on 80's era hardware running under real-time constraints, re-initializing the state array on his end for every update (ie roughly every couple 100ms) is a very expensive operation. To avoid the performance penalty, he initially allocated the array to a fixed maximum size, and wrote over the existing values with the new values on each update.

On my end, running C# on modern Windows hardware, array initialization is the cheap and 'safe' approach so that's exactly what I did. Since, I wasn't sending fixed-length set large enough to blank out values outside of the new set, it was possible for stale values to persist in the overflow of the array.

If the old label existed in the overflow (ie which wasn't set to update), and the same label was present in the new set of values (ie which was set to update), the UI the value would read either the old or new at random.

This would only occur in under very specific conditions. The set of current labels had to be shorter than the previous set of labels. The same label had to be present in the current set and in the overflow. The overflow would persist as long as the size of a new set of labels was shorter than the set of labels the old label was contained in.

Through sheer luck, I managed to get the code patched and working before I left for the hotel. Up until that point, I had worked mostly on the hardware side. I even planned and installed all the hardware for the update before switching back to code.

It's not like this was my first 'aha' troubleshooting moment but it was the first time I had my first full Boris-esque 'I am Invincible!' moment. TBH, I've been hooked ever since.

Max Cerrina • Jan 13 '18

Honest to god, reading that was a hell of a rush. Write more! That was amazing.

Forest Hoffman • Jan 12 '18

There were a few that occurred in development, so no production environments were affected (that I can remember). However, I did once mistype a sudo rm -rf command...

I intended to type ./ for the current directory, but failed to enter the period. So, it was after I hit enter that I realized I had run it with a / in front of everything. I had several backups, so I only lost a bit of work.

I still gave myself a minor heart attack.

Leah • Jan 13 '18

That's why you just write . so you can't accidentally delete everything 👀

Christopher McClellan • Jan 13 '18

How about “I’ve done this too.”?

Fixed a bug in our build system. (Commits direct on a branch instead of merging weren’t building.)
Our build system relied on itself to perform its own build.
Yup. Around 9000 deploys before we managed to shut it down and pin the build to an older version of itself...

Mohannad Najjar • Jan 13 '18

Two months ago I was student and also part-time developer for the main student's services web-app in the university.

I deployed some updates and left the office for my lectures, 4 hours later I noticed the students talking to each other about something so messed up and crazy on the web-app.

I got back to the office, and realized that I accidentally changed the production database connection to the testing database.

Angshuman Halder • Jan 15 '18

So I wrote a bug where when you try to update some data from the main server db to your local server the data gets deleted from the main server and it gets updated in local server's db.

Phil Ashby • Jan 15 '18

Seems like a perfectly reasonable distributed storage/sharding solution... :)

Drew • Jan 13 '18

Oh man, I have a good one for this.

Full disclosure, I'm actually an FPGA engineer, rather than a software engineer. VHDL and verilog my be hardware description languages, but they are close enough to code that I get good ideas from here.

My goofs usually result in damaging components, usually pretty expensive.

This is one of those stories.

I work for a medical device company, and my main project for the last 9 months or so has been subsystem which precisely controls 100+ custom, purpose-built solenoids. Each solenoid is individually controlled by a PWM signal, and ours generally take an approx. 30% duty cycle (meaning the signal spends 30% of the time interval on, 70% off). You can give it 100% duty cycle signal if it lasts less than a minute. We take advantage of this to move them faster, for very short periods of time (approx. 1 millisecond) The metaphor for this is like the gas pedal on your car. You generally don't floor it, but you can for brief periods of time when you need to.

Due to some bad luck and some faulty reset logic I wrote, each solenoid started receiving a PWM signal that was ~15 minutes on, ~15 minutes off approx. 15 minutes after power up. To return to the car metaphor, this would like flooring it until your engine explodes. It toasted every single solenoid in the damn thing. In addition, this happened after the first time we installed it in the larger system, which is a ~1 hour process involving 3 people and a forklift, plus downtime on the larger machine.

Oops.

Adam McKerlie • Jan 13 '18

Not technically a bug but I was once testing SQL injection on our dev site and dropped our database DROP DATABASE dev;. It worked, I patched the form and then started receiving emails saying the site was down.

I wasn't on dev, and our production database was called dev.

Michael Minshew • Jan 13 '18

Why on earth would anyone name their prod env dev..... ???!?!?!?

Adam McKerlie • Jan 13 '18

My guess is after initially developing the app, instead of wanting to change the variable name they just recreated the DB in prod.

Best part, dev was called dev_dev

Maccabee • Jan 14 '18

I feel like dev and prod DBs shouldn't even be on the same machine.

Adam McKerlie • Jan 15 '18

They weren't. I was sql injecting a PHP app and accidentally did it on prod instead of dev.

Bruno Louzada • Jan 15 '18

LOL

Paceaux • Jan 13 '18

Worst thing I ever did was before I was a developer. I was trying to debug an issue with a product line in our content management system. We had Canadian products, US products, and then a small set of US products sold in Canada. That small set of US-in-Canada products weren't displaying images. So, I republished something from the CMS, and fixed it!

2 minutes later, every phone in the department was ringing, as the entire US product line was down. It was a Friday, and the busiest day of the month for traffic. On the one hand, I could argue that it wasn't my fault. Apparently there was a bug in how products were connected, and I'd just uncovered it. On the other hand, no one cares, the entire US product line is gone.

So, I quickly unpublished a thing, and the US product line was back. At which time, I went in to the director's office, confessed what I had done, and why it'd happened. (CMS had no archiving, and no staging environment, theoretically I could've gotten off scott-free). The Director said, "everyone gets one , this was yours"

As a developer... I crashed codewars.com. I mistakenly wrote some infinite, recursive loop in some code challenge. The browser got slow and unresponsive. Then the page froze up. I went to check the health of codewars.com (they actually had a health-check page somewhere), and it was reporting slow. then, it... was down.

Shalvah • Mar 21 '18

You crashed a site from your desk. Badass! 🙌🙌😂😂

Michiel Hendriks • Jan 13 '18 • Edited

Pressing "execute query" in graphical SQL client with the cursor on the first line for this content, in production:

update sometable 
set foo = ...,
bar = ...

where something = ... and ...

This error marked the end of me using auto commit.

Just in case you don't get it. The client treated blank lines as the end of a query.

Bobby Priambodo • Jan 13 '18 • Edited

Ooh I have one (perhaps not a bug, but a mess-up nonetheless), back when I was still a student.

I'm not too sure how far I can disclose the details, but there was a group of (quite "fresh" and naive) CS students that got contracted to create a system to hold a picture-posting competition for employees of a (quite big) company; complete with multiple roles and permissions and such, while maintaining almost thousands of company branches. It worked well in dev... but the servers went down spectacularly on launch day.

Let's just say it involved showing a sophisticated dashboard aggregating data from multiple table joins without indexes, saving uploaded pictures (up to megabytes in size) in base64-encoded form on DB rows, doing many on-the-fly logic calculations... and PHP 5.

Wes • Jan 13 '18

The ones I didn't fix yet. Sneaky little bastards.

Yury • Jan 13 '18

Me and some other developer working on separate parts of our app introduced the same bug. It was quite trivial, we both forgot to close resources which we should've closed. The problem is, my code was triggered by an external app constantly updating users and her code was triggered every few minutes by users working with our app. We had to restart the server every 4 hours or so for a couple of weeks until we figured out what was the problem

Dennis Palmer • Jun 18 '18

In my first job back in the mid 90’s I was working on a program that would send a fax to every contact in our database. I started the execution so that it would run overnight. The next day we got a very angry call from the first contact in the list. Their entire roll of fax paper was on the floor. They had received our fax repeatedly until running out of paper.

Upon reviewing my code, I discovered that I had forgotten the line of code that moves to the next record in the while-not-at-end loop.

Nicolás Wernli • Jan 14 '18

Ooh I gave users access to upload any kind of file in a Profile Photo Upload feature. Some uploaded a script and got access to our database. The good thing, my uploads are really safe now 😂😂

Abubakar Nur Khalil • Apr 21 '18

I essentially wrote this somewhere in a Python codebase

while (some_cond):
   # Some code

Bug: some_cond was always initialized as 0 before the loop. So the code in the loop never ran.

Kept pulling my hair out till I was nearly bald before I realised it 😂

Marek Dabek • Jan 29 '18

I was working for a company that was making accounting SW for a middle size logistics company. I needed to convert one Table into another. The table contained purchase orders for the last 6 months. I wrote a code and went home in the evening. There was no validation team in my company, everything went straight to production (how foolish!). I woke up at 5am and realized that I'd messed up in the code, which was about to be deployed at 8am. Dressed up quickly, went to the office, fixed a bug and saved the day. If I hadn't do that the company might have had IRS audit within next few months, due to the missing money. There were few millions dollars in danger and probably someone would go to jail. Never again I have worked in a company without a validation team.

Juanjo Salvador • Jan 16 '18

I was writting a music organizer, and a little bug deleted all my music (about 1200 mp3 files). The good part: it took only 25s.

Maccabee • Jan 14 '18

I took the company out of Google search.

I was working for a multinational company. This company relied heavily on being on the front page for multiple products.

They have sites for each country, for i18n, all generated from the same ASP.NET MVC code and a CMS.

One country requested us to allow a product page that wasn't index at all to be indexed for that country.

Somehow my logic removed all indexing from all products, except for that country. It was something that couldn't be tested by QA as indexing was turned off for that env.

It wasn't caught for a few days until one of the higher-ups, think C*O, noticed. There was a frantic rush to fix it.

Muhammad Arslan Aslam • Aug 27 '18

Yeah.
I wrote some code that had a setState in the render(). Function with the setState was hitting a client's service which was taken down within an hour after we released and we didn't realize that until the client told us!

Not the worst, but was definitely silly!

Anthony Delgado • May 13 '18

I once wrote a Twilio SMS app that would forward incoming messages to itself creating an infinite loop of messages, racking up hundreds of dollars in charges in a few minutes. Thankfully we caught the bug and Twilio was nice enough to refund the charges.

Eric McCormick • Jan 16 '18 • Edited

I read the docs, the docs were wrong.

Event loops are always fun. As a new developer to a proprietary platform, I followed some documentation while building out some potential combinations for matching type-ahead values (server-side computation, would then send possible results). Being a newbie to the platform, I followed some example documentation for iterating the collection type and building such a thing out. Where things get interesting is that the platform made use of Java handles on top of C objects, with no automatic garbage collection.

Following return from my first vacation at that job, I found out that the entire new app I had rolled out was reverted. Something to do with "the server kept crashing". Do tell. The best part, as the server had to aggregate enough memory handles to fill up before going down, I was able to show it working correctly after my return. Eventually it was found out and I cursed the documentation. That'll teach me to RTFM.

Dustin King • Jan 18 '18

When I added a new feature to an old and heavily used C++ server program, my code had a memory leak in it, that probably would have caused a lot of crashes in production. Fortunately the sysadmin put it through its paces in a "burn-in" period in the staging environment, so the issue was found and fixed before ever going to production.

Matt Ellen-Tsivintzeli • Jan 19 '18

I wrote a service that allowed a company to automate faxing trades (e.g. buying shares). It seemed to work in dev, but in production it fell over and everyone else had to rush around to fax the trades by hand 😶

Angel Daniel Munoz Gonzalez • Jan 13 '18

I wrote a piece of code that normally worked fine, but one of the least used features some way somehow hit a bunch of null/undefined conditians which was not foreseen neither by the test cases or my coworkers that ended up deleting several tables due to a bug in the orm we used, so my bug triggered an underlying bug in our stack hahaha

Juan Julián Merelo Guervós • Jan 13 '18

I wrote a paper submission system for a conference, PPSN, back in 2000. Title, authors, paper PDF path were stored in a MySQL database. You would assume that scientific paper titles are shorter than 256 characters, right? They are NOT.
When weird paper titles started to show up and the chairperson noticed and told me, I had to reconstruct paper titles from the Apache log, after making a redefinition of the table that stored them. Everything was written in Perl, so it was not so big a deal, took a short time to fix. But then I'm in academia, so nothing is so big a deal.

mshwf • Jan 13 '18

I was in charge of developing a history feature, it deserialize the object being updated as json and saves it in the history table, this object is mostly a master - detail model, that means it has a sub-list, sometimes of +1000 records, the json string reached 300,000 characters, for every update I was saving this toooo long json as a single history log even if the change occurred in head (master) object or some few records from the list.. This bug causes the database grows very fast, in few days its size measured in tens of gigabytes.. Thankfully I wasn't fired, my manager kept his calmness. I foud a way to save only the changed data in the object.

🍥 CDG 🍥 • Jan 13 '18 • Edited

Not a big bug, tried to extract data from a 1.5gb CSV file with 15 columns, put the data of each line in a temp dB and then, loop into it to extract each word separated by a comma for some columns ...

First part took almost an hour to complete and the extract part...
crushed the school DB because the badly formated CSV contained "" and , INSIDE the data separated by comma, and the script didn't handled it well for this amount of data. Good times tho.

Jakub T. Jankiewicz • Jul 4 '18

I have two stories:

On my shared hosting where I had few websties, I've created trypython where I executed commands on the server (using CGI) and I thought that I've secured the thing even that someone on StackOverflow said that it's not possible to sandbox standalone python. You couldn't directly import modules that execute shell commands or other modules that import it. I thought it was ok until someone wipe out whole disk for my account.
On my old old laptop I had Linux installed and I was running out of disk space on my root partition, the /home/ partition still have room so I moved /tmp directory to /home and created symlink in place of original directory. It was working fine but then I needed more space, so I moved /usr directory and while moving my whole installation was broken because commands could not find shared libraries /usr/lib, I was only able to use apps that where already running. I had broken CD and it was not possible to boot from USB and didn't have external drive to backup my data. I thought that machine was broken because it had non-SATA DVD drive and you could only buy new Sata DVD. Fortunately I had firefox running with fireFTP and connected, via WiFI, with ftp to my mother laptop, where I installed ftp server and backup all the data. Few years later I bought used DVD Drive that was working with that laptop and give it to my mother.

Aidas Bendoraitis • Jan 13 '18

I don't currently remember the details, but once my script sent emails to thousands of users where only about 20 should have received them. That was a really frustrating day.

Another uncomfortable situation was not exactly written by me, but was the outcome of not good enough configuration. When an error was happening on the server, we were getting a separate error-reporting message by email. And once MySQL database crashed. Over a number of hours users' requests sent so many messages that they flooded the email server and nobody in our company could receive any more emails for the whole day or two. Later we switched error reporting to server-based solution which combines error events into types of errors and sends only one notification by email for similar cases.

Kim Kulling • Jan 14 '18

I was a developer for a company building medical devices. There was a simple EEPROM storing the operation hours. This EEPROM was connected to the rest of the system via a real unstable bus system. So my job was adding a checksum when accessing the EEPROM to make it safe. If the read value did not fit to the checksum -> try it again 3 times and stop the system if there were still errors in the checksum. Unfortunately this bus was really unstable and somehow every subsystem was coupled to this readout-op from the EEPROM. My last delivery before starting a big beta-test-phase contained the fix.
And I warned my project-lead: do not deliver this fix at this point in time. He did believe that the fix will work ( this was the case, it showed us how unstable the EEPROM access was ) and we started the test-phase with this and the whole device was blocked by one operation-hour-readout fix ( my fix ). Argh!!! And I was fucked! This sommer was hot but I sat a lot in front of the device until it was fixed.

Vincenzo • Apr 1 '18

I was working on a legacy application for my ex company, we were putting live a bunch of logic on some stored procedures to round up the price to make it end with a 9 or 5.
It was supposed to be feature toggled off, but after 7 hours someone noticed some incoherence on prices, 1 pounds here 4 pounds there... Eventually I was like f*********ck. I go and check the code and basically I had made the if check the other way around.
If the feature is not active, apply the logic, rather than the right way... So yeah my company had to pay something like 1000£ back

Raymond Price • Jun 18 '18

Some years ago, I was working as the front-end developer at an e-commerce service company, providing smaller online retailers with recommendation services. The way it worked was that we loaded, on each client page, a JS config file containing client-specific customizations, a platform-specific library, and a file containing the core functionality. This also predated JS build tools as a real Thing, so we had to load each file separately rather than a single bundle.

Well, one day I was asked to add a certain feature for a particular client; this was a simple thing, just converting a piece of data from some computer-readable form to something more human-friendly, so I just put the function in the platform file and put the lookup table it referenced in the core, so it'd be available to other platforms if they needed it. It all worked fine, so I uploaded it to our CDN and waited for the change to propagate out.

A few days later I got a call about our code breaking a client's site. Apparently the new platform file had propagated out, but the core file hadn't, so when my new function went looking for its lookup table, it couldn't find it and broke the website of actually our largest client.

TLDR: I lost our largest client and almost my job because I didn't really know how CDNs worked.

Aaron • Jun 18 '18 • Edited

Thought I was being clever by creating an event on a static class (.NET). Forgetting that every request on the website resulted in a handler added, but never removed.

It took a while (several days) for the memory to fill up and the web application to crash.

I was on other stuff at the time they found out. Some people spent a lot of time finding and fixing that bug.

The worst part of it is I still clearly remembered the moment I "solved" the problem by creating an event on a static class. And all I wanted was to go back in time and tell myself: "careful Aaron ..."

Also had the classic run-away e-mail service mail a customer. I was driving home at the time so didn't realise until my boss called me. Luckily the receiver was OK with it. Also, I let my boss know through Skype chat "the problem with the thousands of e-mails to the customer is fixed" and it popped up while he was giving a presentation to more bosses. So they sent their congratulations.

Can't make an omelette without breaking a few eggs.

Peter Ellis • Sep 3 '18

What the app was supposed to do is take in a series of files and link them together via a lookup table. The lookup table was stored in a database, but it was slow, so I decided to cache it. Unfortunately failed lookups would create a new entry in the table.

Even more unfortunately when that happened, the cache stored a different value to what ended up being written back to the database.

What made this bug particularly insidious is that because the actual values didn't really matter, just the fact that they linked up across a batch of files, it mostly worked. Except if a batch was imported across a longer period of time, and there happened to be a server reboot in the middle (forcing the cache to be re-created from the values in the database), things would go very wrong.

This went on for years without anybody being able to make the connection, so we could never reproduce it. It wasn't catastrophic but it was incredibly annoying.

Anyway, when a the tester finally provided a full description of the circumstances she encountered the error in, I bought her a huge box of chocolates. (And fixed the bug in less than 10 minutes.)

Scott Hardy • Aug 24 '18

My first year in my career...

sudo chmod -R 0777 /

On a production server with 100+ clients on it. I focused on front-end earlier in my career because of that mistake.

Chad Kunde • Jan 16 '18

Before officially being a dev, I was tasked to pull a personnel data report for a CG. Since it was a lot of data, I built it as a set of smaller queries that were supposed to be joined in the summary stage. Unfortunately, the database "optimized" the set of queries into a single, giant query that joined across over a dozen tables.

Long story short, I crashed the main database for over two hours. The join would have resulted in over 10 trillion rows of data, so the staff had to force a reboot to get it to respond. Since that gave me an error, I ran the report again. By the time it was over, the db had been offline for most of the day. There was a very unhappy call directly to my desk from the team that ran the db server.

Chad Smith • Jan 13 '18

I have a few, but one that isn't really a bug, but a stupid mistake on my end.
I was getting ready to deploy a new version of a web app to staging. After the deploy I went and had our test team test it out for a little bit. An hour later I noticed a mistake...

The staging configuration pointed to the production database. Oops...

Kristian Tsivkov • Nov 18 '19 • Edited

Well.. well.. well.
It's not a bug, but definitely something funny..
While testing regular expressions in C, I was trying to get just the contents of the files in a directory ignoring the white spaces, and export the result in new files. That's good, but running something like this: "[^\s](.*)" and exporting the results in files in the same directory you are "scanning" is a bad idea. Running it with sudo on the server where your website is hosted is even worse. Before you realize what's wrong you are going to be dropped from ssh, the cpu will be 100% and your disk will be busy. Feels like an internal DoS.

Chathula Sampath • Jan 13 '18 • Edited

this is not actually a coding bug once i was writing a rest api.i need a ssh tunel to connect my api with some third party apis. that tunel server used our company production server ip on gcp. so i connected to tunnel.but got an issue.i was using gcp cli. i saw a delete command and thought it only delete local copy. just ran that command.

ok guys. thank you 😂😭

Matthew Watkins • Jan 13 '18

Wow, what's wrong with you guys? I never have any bugs in my code ;)

Vignesh M • Oct 24 '18 • Edited

😱😱 Same!! It was on production for a few mins before I fixed it

Abdul Ghani • Jan 13 '18

Same thing happened with me. That day i learned to double check my update queries before pushing them to production server 😂

Wendy Stocker • May 19 '18 • Edited

A Redis caching plug-in that caused Varnish caching to be bypassed rendering Varnish useless. I caused the site to crawl for days, and lots of angry faces. DOH!

Ben Sinclair • Apr 16 '18

"UPDATE users SET first_name='Martin'; WHERE user_id=12345"

30,000 rows updated.

Slony replication begins, 100+ remote systems fail under load.

Wendy Stocker • May 19 '18

I can't tell you how many times a semicolon has ruined my day.

Shalvah • Mar 21 '18

Your program had access to their raw passwords?

mr-lab • Oct 16 '18 • Edited

well this is the shitiest bug i have ever come across youtube.com/watch?v=YTr4rFzAwBQ

Dan Benge • Apr 26 '18

That kind of thing always happened to me whenever I did that "One quick untested fix" before I checked the code in.

Chathula Sampath • Jan 13 '18

what did you do to fix it?

pranay rauthu • Sep 9 '18

I used to work on a .net application which included both webforms and mvc projects. Once I added aspx view engine code in razor page. I still can't remember how could I do such a weird thing.

Steven • Mar 18 '18 • Edited

Love this Gif, I would be freaking out if I did the same mistake.

Evaldas Buinauskas • Jan 16 '18 • Edited

Wrote a sql script that took too long to execute. It took a day to rollback the whole transaction. Good that it was internal pre production environment

Phil Ashby • Jan 15 '18

Is this your first admission of the offence, or did you fess up between then and now? Are you /still/ working there today? 8-0