What was the worst bug you've ever written?

So I just wrote my worst bug ever.

Any stories to make me feel better?

Did you find this post useful? Show some love!

Hey there, we see you aren't signed in.

Please consider creating an account on dev.to. It literally takes a few seconds and we'd appreciate the support so much. ❀️

Already told this story as a tweet responding to the above, but longer stories are always fun.

I set up a RPI Zero with a webcam to monitor my front door for potential visiters. I wanted to have it text me any time someone was at the door. Originally I tried to use the text-email gateway for my provider, but sending emails from the RPI just ended up looking like spam, and I didn't want to bother setting up some email relay, so I looked into using Twilio. Thankfully they have a nice little script you can use to text a given number, you just have to supply your API key and get a Twilio number to text from. The software I was using, motion, also has a bunch of nifty hooks, including one for when it detects an "event" of motion, the definition of which you can tune. Awesome! So, plug in the script to that, restart motion, and start waving my hand in front of the camera!

I then proceeded to get about 500 texts in the span of about a minute.

Eventually, after shutting everything down, I figured out what I did wrong. I had put the script into the on_motion hook, instead of the on_motion_event hook. The former hook is if you want to trigger something any time a frame of motion is detected. Wave your hand in front of that a bunch, and you're bound to trigger that hook many times. So not only did I manage to flood my phone with texts, but I also burnt through a good amount of Twilio trial credits.

I once was testing sudoers, and happened to name the account f*ckyou@companydomain.com. Well I made a mistake in the sudoers setup and caused it to send the traditional: f*ckyou@companydomain.com is not authorized to the support email address that had about 60 ppl on it ranging from business to unix admins...

This is pretty great

Had a similar one. I had just rewritten a bunch of cron jobs that send time sensitive emails.

There was one cron job that was supposed to run every minute and check the status of a bunch of records that where created 20 minutes before.

When I was testing this little script I changed the time span where the record needed to be created from 1 minute to something like three days and I forgot to change it back before deploying.

When we deployed the change everything went fine because the script wasn't supposed to run till the following morning at around 9am.

Well the morning after I got to the office and everybody was going around crazy because some customers had already received ~30 mails each. I ran to my computer and started killing the container that was sending the emails before going through the source code and finding where the problem was.

I sent around 10k emails in 10 minutes.

Never have I ever wrote a cron job without a unit test after that.

Wait...cron jobs have unit tests?

I was messing with a (dumb) co worker. I told him that β€˜rm -rF /β€˜ would fix all issues in his code. As a joke I added it at to the top of the loader for our web app... I accidentally pushed it to a production server. Long story short I lost my job and had to rewrite everything.

Yikes! This is exactly why we tell people not to make light of death commands. :P You make that mistake once, and never again!

Such a benign version, but I still remember how super upset I was as a kid and someone told me alt+f4 would refresh the page or something.

"there's this really cool Gameboy cheat code for Tetris, if you press 'Select, Start, B, and A' all at the same time" is a really cool/cruel joke kids of the early 90s would tell each other ...

more related to your benign example (and even more benign): when i was first really tinkering with Linux back in like 2001/2002, i couldn't figure out how to get out of this program i was using (you'd think it would be Vi, but it was probably just a Man page) and a co-worker suggested Alt-F4-- while that did technically get me unstuck from my issue, it took me another day or so to figure out what i'd actually done

Oh man... My heart really sunk for you...

It did fix all code issues though :)

Are you being serious? That is remarkable.

I once ran rm -rf / in a virtual machine just for fun to see what it would happen.

The VM had some shared disks with my actual OS and it started deleting everything that I had on my desktop and some other important folders.

I was long due for a backup so I ended up losing a crap ton of data, but now I will never forget to check for shared disks when i'm working in a VM.

I think you win this round or "who's made the biggest mistake"

To your credit you identified a number of workflow issues doing that:

  • peer review anyone?
  • the code only /exists/ on production (where's the source control)?
  • firing you /before/ asking you to re-write seems pretty short sighted..

I hope you've got a job somewhere more thoughtful now :)

Thank you for trying to credit me, a few issues there haha. I was the primary reviewer, I had worked on everything so long that no one really bothered reviewing my code. We had two git repos, one development & one production. I pushed the commit to the production repo instead of the development one haha.

Holy sh*t. My boss would've killed me if I was in your position.

In a SQL update statement, always ALWAYS write the WHERE first or you might end up giving I think it was about 130000 production users the same password.
I spent the rest of the day writing a program to puzzle together the as-was from various subsystems and the test environment because I was fairly new there and didn't want to get fired telling and asking for a database restore.
They didn't find out until I told them months later and we all had a good laugh.
Still work there.

Same thing happened with me. That day i learned to double check my update queries before pushing them to production server πŸ˜‚

Is this your first admission of the offence, or did you fess up between then and now? Are you /still/ working there today? 8-0

"They didn't find out until I told them months later and we all had a good laugh.
Still work there."

I wrote a program that connected to prod, test and the auth environments and could puzzle together where what user logged in last or changed a password.
The program then used those last-known values (that predated my own) to update the production auth tables.
It took about 20 minutes to run but I didn't dare to look and so I went for a walk.
When I came back I was very pleased to see the hashes and salts were all different again.
When I still hadn't heard any major complaints about a week later, I got the impression I had gotten it right ;)
That was a long week!

Your program had access to their raw passwords?

No no. Just hashes and salts.

Many years ago, I tried to use some deeply nested-for loops in ActionScript 3.0 code I wrote to find the right combination of values in an XML template to match the hash of a given file. (The goal was to determine what exactly had changed in the file.)

About 20 minutes into waiting for it to finish, I decided to figure out how long it was going to take.

I determined mathematically it would finish on June 18, 2598 at 8:42 AM.

I learned two things that day:

1) One-way hashes are called that for a reason.

2) Deeply nested for loops are the work of the Devil anyway.

Well how's the script coming along now? Any celebrations planned for 2598?

I don't have that much patience, so I just aborted it instead. :P

I discovered Insert with multiple rows, tried insert 250k lines each line a insert command in a remote database it will take 5 days hahaha. With one insert and 250k rows it took < 5 mins.

I have an entire post on my old blog (rip... I should really write more technical stuff 😒) about a ternary operator. No spoilers, but I identified 5.5 issues with this single line of code. I even received a comment from someone who never commented before pointing out more bugs. Try to "enjoy." :)


Two months ago I was student and also part-time developer for the main student's services web-app in the university.

I deployed some updates and left the office for my lectures, 4 hours later I noticed the students talking to each other about something so messed up and crazy on the web-app.

I got back to the office, and realized that I accidentally changed the production database connection to the testing database.

Nope. You messed up Ben and should feel bad. I'm just kidding dude. This isn't even that bad.

I once worked on a project that every month or two some developer would sneak in a cyclic dependency on the database to be up before the application could be up. This meant that we could not run migrations on a blank database anymore because the code would assume there already was a working database. As you can imagine this would always halt CI in its tracks because we couldn't bootstrap the system anymore. I believe folks in cybernetics call this a strange loop.

Hope you learn from your mistake and my story and avoid future strange loops.

I stopped the line at a major North American steel producer for the better part of a day.

I wrote a database migration to add a column and was as happy as a clam! I deployed it to production and thought everything would be fine.

Five minutes later I realize that my migration also involved updating every single user record and the database is locked and no one can access prod and OOPS πŸ™ˆ

PS. You all did a great job handling the downtime ❀️

Yasss! I did this too. πŸ˜‚πŸ˜‚

Network communication bug on a custom ping-pong handshake protocol caused by stale state in an array overflow.

For a little perspective. I was writing a graphical multi-touchscreen front-end for a 80's era commercial flight simulator. The sim host ran a early 80s era realtime AIX environment. Booting the damn thing took about 15 minutes and involved following a 'bootstrap loader' procedure whereby you'd have to fat-finger the boot instructions into a keypad in hexadecimal. Ethernet networking didn't exist when it was built so we had to have an ethernet card custom built. On the software end we were working with an AIX specialist from the UK who designed the protocol, implemented a bare bones ethernet driver, and provided a client/server implementation on the host end.

Our AIX guy had done this before on other similar machines but it was my first journey into networking programming. I wouldn't have physical access to the host until integration so the AIX guy gave me a mini client/server simulator app written in C that I could train my client/server implementation against.

My end was entirely written in C#, where trying to do layer-2 networking is difficult enough. Either way, I found a PCAP wrapper (ie SharpPcap)to hack together a networking protocol, translated a floating point format converter (Ie the host didn't use IEEE 754) from C, reverse engineered a raw dump of the symbol table and loaded it into a DB, etc...

Integration finally came and I was a nervous wreck. We had 7 full days of downtime to make the changes, including ripping out the old hardware (ie ray-tracing displays + 100+ buttons) and replacing it with the new (3 touch screens). Flight simulators are the backbone of an airline, pilots who can't keep up with the FAA requirements for required training are grounded until they can. Therefore, flight simulators typically operate 20 hours/day, 7 days/week. Downtime beyond our scheduled window was not an option.

Months of preparation and death-march sprinting was finally yielding results. Everything was working brilliantly to the point where we could even start identifying opportunities to make adjustments and performance improvements.

That was, until somebody noticed something strange happening on the interface. Instead of updating to show the current state of the host as expected, a small select number of labels were constantly toggling between the correct value and something else. The symptom only occurred on a few pages, and only when the pages were loaded in a specific order.

I racked my brain for hours chugging coffee and pouring over every detail of my networking code until at about 4AM. That's when I finally managed to catch the AIX specialist taking a break from his VT220 terminal. I picked his brain, going back over the the specifics of the networking protocol spec. When that failed yield results, I started picking his brain about how he implemented the protocol on the host. That's when I had a sudden 'lightbulb' moment.

It turns out that, on 80's era hardware running under real-time constraints, re-initializing the state array on his end for every update (ie roughly every couple 100ms) is a very expensive operation. To avoid the performance penalty, he initially allocated the array to a fixed maximum size, and wrote over the existing values with the new values on each update.

On my end, running C# on modern Windows hardware, array initialization is the cheap and 'safe' approach so that's exactly what I did. Since, I wasn't sending fixed-length set large enough to blank out values outside of the new set, it was possible for stale values to persist in the overflow of the array.

If the old label existed in the overflow (ie which wasn't set to update), and the same label was present in the new set of values (ie which was set to update), the UI the value would read either the old or new at random.

This would only occur in under very specific conditions. The set of current labels had to be shorter than the previous set of labels. The same label had to be present in the current set and in the overflow. The overflow would persist as long as the size of a new set of labels was shorter than the set of labels the old label was contained in.

Through sheer luck, I managed to get the code patched and working before I left for the hotel. Up until that point, I had worked mostly on the hardware side. I even planned and installed all the hardware for the update before switching back to code.

It's not like this was my first 'aha' troubleshooting moment but it was the first time I had my first full Boris-esque 'I am Invincible!' moment. TBH, I've been hooked ever since.

Honest to god, reading that was a hell of a rush. Write more! That was amazing.

There were a few that occurred in development, so no production environments were affected (that I can remember). However, I did once mistype a sudo rm -rf command...

I intended to type ./ for the current directory, but failed to enter the period. So, it was after I hit enter that I realized I had run it with a / in front of everything. I had several backups, so I only lost a bit of work.

I still gave myself a minor heart attack.

That's why you just write . so you can't accidentally delete everything πŸ‘€

How about β€œI’ve done this too.”?

Fixed a bug in our build system. (Commits direct on a branch instead of merging weren’t building.)
Our build system relied on itself to perform its own build.
Yup. Around 9000 deploys before we managed to shut it down and pin the build to an older version of itself...

So I wrote a bug where when you try to update some data from the main server db to your local server the data gets deleted from the main server and it gets updated in local server's db.

Seems like a perfectly reasonable distributed storage/sharding solution... :)

Oh man, I have a good one for this.

Full disclosure, I'm actually an FPGA engineer, rather than a software engineer. VHDL and verilog my be hardware description languages, but they are close enough to code that I get good ideas from here.

My goofs usually result in damaging components, usually pretty expensive.

This is one of those stories.

I work for a medical device company, and my main project for the last 9 months or so has been subsystem which precisely controls 100+ custom, purpose-built solenoids. Each solenoid is individually controlled by a PWM signal, and ours generally take an approx. 30% duty cycle (meaning the signal spends 30% of the time interval on, 70% off). You can give it 100% duty cycle signal if it lasts less than a minute. We take advantage of this to move them faster, for very short periods of time (approx. 1 millisecond) The metaphor for this is like the gas pedal on your car. You generally don't floor it, but you can for brief periods of time when you need to.

Due to some bad luck and some faulty reset logic I wrote, each solenoid started receiving a PWM signal that was ~15 minutes on, ~15 minutes off approx. 15 minutes after power up. To return to the car metaphor, this would like flooring it until your engine explodes. It toasted every single solenoid in the damn thing. In addition, this happened after the first time we installed it in the larger system, which is a ~1 hour process involving 3 people and a forklift, plus downtime on the larger machine.


Not technically a bug but I was once testing SQL injection on our dev site and dropped our database DROP DATABASE dev;. It worked, I patched the form and then started receiving emails saying the site was down.

I wasn't on dev, and our production database was called dev.

Why on earth would anyone name their prod env dev..... ???!?!?!?

My guess is after initially developing the app, instead of wanting to change the variable name they just recreated the DB in prod.

Best part, dev was called dev_dev

I feel like dev and prod DBs shouldn't even be on the same machine.

They weren't. I was sql injecting a PHP app and accidentally did it on prod instead of dev.

Worst thing I ever did was before I was a developer. I was trying to debug an issue with a product line in our content management system. We had Canadian products, US products, and then a small set of US products sold in Canada. That small set of US-in-Canada products weren't displaying images. So, I republished something from the CMS, and fixed it!

2 minutes later, every phone in the department was ringing, as the entire US product line was down. It was a Friday, and the busiest day of the month for traffic. On the one hand, I could argue that it wasn't my fault. Apparently there was a bug in how products were connected, and I'd just uncovered it. On the other hand, no one cares, the entire US product line is gone.

So, I quickly unpublished a thing, and the US product line was back. At which time, I went in to the director's office, confessed what I had done, and why it'd happened. (CMS had no archiving, and no staging environment, theoretically I could've gotten off scott-free). The Director said, "everyone gets one , this was yours"

As a developer... I crashed codewars.com. I mistakenly wrote some infinite, recursive loop in some code challenge. The browser got slow and unresponsive. Then the page froze up. I went to check the health of codewars.com (they actually had a health-check page somewhere), and it was reporting slow. then, it... was down.

You crashed a site from your desk. Badass! πŸ™ŒπŸ™ŒπŸ˜‚πŸ˜‚

Pressing "execute query" in graphical SQL client with the cursor on the first line for this content, in production:

update sometable 
set foo = ...,
bar = ...

where something = ... and ...

This error marked the end of me using auto commit.

Just in case you don't get it. The client treated blank lines as the end of a query.

Ooh I have one (perhaps not a bug, but a mess-up nonetheless), back when I was still a student.

I'm not too sure how far I can disclose the details, but there was a group of (quite "fresh" and naive) CS students that got contracted to create a system to hold a picture-posting competition for employees of a (quite big) company; complete with multiple roles and permissions and such, while maintaining almost thousands of company branches. It worked well in dev... but the servers went down spectacularly on launch day.

Let's just say it involved showing a sophisticated dashboard aggregating data from multiple table joins without indexes, saving uploaded pictures (up to megabytes in size) in base64-encoded form on DB rows, doing many on-the-fly logic calculations... and PHP 5.

I once wrote an SQL query to fix reversed endian inserted values due to a bug. It worked great for the incorrect values, but unfortunately it set all values that were correct to null, which was more than 99% of the data. Fortunately there was a backup, although it was not the fastest and took 2 days to restore.

Me and some other developer working on separate parts of our app introduced the same bug. It was quite trivial, we both forgot to close resources which we should've closed. The problem is, my code was triggered by an external app constantly updating users and her code was triggered every few minutes by users working with our app. We had to restart the server every 4 hours or so for a couple of weeks until we figured out what was the problem

I essentially wrote this somewhere in a Python codebase

while (some_cond):
   # Some code

Bug: some_cond was always initialized as 0 before the loop. So the code in the loop never ran.

Kept pulling my hair out till I was nearly bald before I realised it πŸ˜‚

I took the company out of Google search.

I was working for a multinational company. This company relied heavily on being on the front page for multiple products.

They have sites for each country, for i18n, all generated from the same ASP.NET MVC code and a CMS.

One country requested us to allow a product page that wasn't index at all to be indexed for that country.

Somehow my logic removed all indexing from all products, except for that country. It was something that couldn't be tested by QA as indexing was turned off for that env.

It wasn't caught for a few days until one of the higher-ups, think C*O, noticed. There was a frantic rush to fix it.

I was working for a company that was making accounting SW for a middle size logistics company. I needed to convert one Table into another. The table contained purchase orders for the last 6 months. I wrote a code and went home in the evening. There was no validation team in my company, everything went straight to production (how foolish!). I woke up at 5am and realized that I'd messed up in the code, which was about to be deployed at 8am. Dressed up quickly, went to the office, fixed a bug and saved the day. If I hadn't do that the company might have had IRS audit within next few months, due to the missing money. There were few millions dollars in danger and probably someone would go to jail. Never again I have worked in a company without a validation team.

When I added a new feature to an old and heavily used C++ server program, my code had a memory leak in it, that probably would have caused a lot of crashes in production. Fortunately the sysadmin put it through its paces in a "burn-in" period in the staging environment, so the issue was found and fixed before ever going to production.

I read the docs, the docs were wrong.

Event loops are always fun. As a new developer to a proprietary platform, I followed some documentation while building out some potential combinations for matching type-ahead values (server-side computation, would then send possible results). Being a newbie to the platform, I followed some example documentation for iterating the collection type and building such a thing out. Where things get interesting is that the platform made use of Java handles on top of C objects, with no automatic garbage collection.

Following return from my first vacation at that job, I found out that the entire new app I had rolled out was reverted. Something to do with "the server kept crashing". Do tell. The best part, as the server had to aggregate enough memory handles to fill up before going down, I was able to show it working correctly after my return. Eventually it was found out and I cursed the documentation. That'll teach me to RTFM.

The ones I didn't fix yet. Sneaky little bastards.

I wrote a service that allowed a company to automate faxing trades (e.g. buying shares). It seemed to work in dev, but in production it fell over and everyone else had to rush around to fax the trades by hand 😢

I wrote a piece of code that normally worked fine, but one of the least used features some way somehow hit a bunch of null/undefined conditians which was not foreseen neither by the test cases or my coworkers that ended up deleting several tables due to a bug in the orm we used, so my bug triggered an underlying bug in our stack hahaha

I wrote a paper submission system for a conference, PPSN, back in 2000. Title, authors, paper PDF path were stored in a MySQL database. You would assume that scientific paper titles are shorter than 256 characters, right? They are NOT.
When weird paper titles started to show up and the chairperson noticed and told me, I had to reconstruct paper titles from the Apache log, after making a redefinition of the table that stored them. Everything was written in Perl, so it was not so big a deal, took a short time to fix. But then I'm in academia, so nothing is so big a deal.

I was in charge of developing a history feature, it deserialize the object being updated as json and saves it in the history table, this object is mostly a master - detail model, that means it has a sub-list, sometimes of +1000 records, the json string reached 300,000 characters, for every update I was saving this toooo long json as a single history log even if the change occurred in head (master) object or some few records from the list.. This bug causes the database grows very fast, in few days its size measured in tens of gigabytes.. Thankfully I wasn't fired, my manager kept his calmness. I foud a way to save only the changed data in the object.

I have a few, but one that isn't really a bug, but a stupid mistake on my end.
I was getting ready to deploy a new version of a web app to staging. After the deploy I went and had our test team test it out for a little bit. An hour later I noticed a mistake...

The staging configuration pointed to the production database. Oops...

Not a big bug, tried to extract data from a 1.5gb CSV file with 15 columns, put the data of each line in a temp dB and then, loop into it to extract each word separated by a comma for some columns ...

First part took almost an hour to complete and the extract part...
crushed the school DB because the badly formated CSV contained "" and , INSIDE the data separated by comma, and the script didn't handled it well for this amount of data. Good times tho.

Ooh I gave users access to upload any kind of file in a Profile Photo Upload feature. Some uploaded a script and got access to our database. The good thing, my uploads are really safe now πŸ˜‚πŸ˜‚

I don't currently remember the details, but once my script sent emails to thousands of users where only about 20 should have received them. That was a really frustrating day.

Another uncomfortable situation was not exactly written by me, but was the outcome of not good enough configuration. When an error was happening on the server, we were getting a separate error-reporting message by email. And once MySQL database crashed. Over a number of hours users' requests sent so many messages that they flooded the email server and nobody in our company could receive any more emails for the whole day or two. Later we switched error reporting to server-based solution which combines error events into types of errors and sends only one notification by email for similar cases.

I was a developer for a company building medical devices. There was a simple EEPROM storing the operation hours. This EEPROM was connected to the rest of the system via a real unstable bus system. So my job was adding a checksum when accessing the EEPROM to make it safe. If the read value did not fit to the checksum -> try it again 3 times and stop the system if there were still errors in the checksum. Unfortunately this bus was really unstable and somehow every subsystem was coupled to this readout-op from the EEPROM. My last delivery before starting a big beta-test-phase contained the fix.
And I warned my project-lead: do not deliver this fix at this point in time. He did believe that the fix will work ( this was the case, it showed us how unstable the EEPROM access was ) and we started the test-phase with this and the whole device was blocked by one operation-hour-readout fix ( my fix ). Argh!!! And I was fucked! This sommer was hot but I sat a lot in front of the device until it was fixed.

Before officially being a dev, I was tasked to pull a personnel data report for a CG. Since it was a lot of data, I built it as a set of smaller queries that were supposed to be joined in the summary stage. Unfortunately, the database "optimized" the set of queries into a single, giant query that joined across over a dozen tables.

Long story short, I crashed the main database for over two hours. The join would have resulted in over 10 trillion rows of data, so the staff had to force a reboot to get it to respond. Since that gave me an error, I ran the report again. By the time it was over, the db had been offline for most of the day. There was a very unhappy call directly to my desk from the team that ran the db server.

I was working on a legacy application for my ex company, we were putting live a bunch of logic on some stored procedures to round up the price to make it end with a 9 or 5.
It was supposed to be feature toggled off, but after 7 hours someone noticed some incoherence on prices, 1 pounds here 4 pounds there... Eventually I was like f*********ck. I go and check the code and basically I had made the if check the other way around.
If the feature is not active, apply the logic, rather than the right way... So yeah my company had to pay something like 1000Β£ back

this is not actually a coding bug once i was writing a rest api.i need a ssh tunel to connect my api with some third party apis. that tunel server used our company production server ip on gcp. so i connected to tunnel.but got an issue.i was using gcp cli. i saw a delete command and thought it only delete local copy. just ran that command.

ok guys. thank you πŸ˜‚πŸ˜­

Love this Gif, I would be freaking out if I did the same mistake.

Wrote a sql script that took too long to execute. It took a day to rollback the whole transaction. Good that it was internal pre production environment

"UPDATE users SET first_name='Martin'; WHERE user_id=12345"

30,000 rows updated.

Slony replication begins, 100+ remote systems fail under load.

I was writting a music organizer, and a little bug deleted all my music (about 1200 mp3 files). The good part: it took only 25s.

Wow, what's wrong with you guys? I never have any bugs in my code ;)

Classic DEV Post from Dec 12 '17

Changelog: Site-wide design changes and a few other goodies

Whenever a website changes its design, there's bound to be mixed feelings. But ...

Follow @ben to see more of their posts in your feed.