DEV Community

Ben Halpern
Ben Halpern

Posted on

What was the worst bug you've ever written?

So I just wrote my worst bug ever.

Any stories to make me feel better?

Top comments (108)

Collapse
 
dvdmuckle profile image
David Muckle

Already told this story as a tweet responding to the above, but longer stories are always fun.

I set up a RPI Zero with a webcam to monitor my front door for potential visiters. I wanted to have it text me any time someone was at the door. Originally I tried to use the text-email gateway for my provider, but sending emails from the RPI just ended up looking like spam, and I didn't want to bother setting up some email relay, so I looked into using Twilio. Thankfully they have a nice little script you can use to text a given number, you just have to supply your API key and get a Twilio number to text from. The software I was using, motion, also has a bunch of nifty hooks, including one for when it detects an "event" of motion, the definition of which you can tune. Awesome! So, plug in the script to that, restart motion, and start waving my hand in front of the camera!

I then proceeded to get about 500 texts in the span of about a minute.

Eventually, after shutting everything down, I figured out what I did wrong. I had put the script into the on_motion hook, instead of the on_motion_event hook. The former hook is if you want to trigger something any time a frame of motion is detected. Wave your hand in front of that a bunch, and you're bound to trigger that hook many times. So not only did I manage to flood my phone with texts, but I also burnt through a good amount of Twilio trial credits.

Collapse
 
dstull profile image
Doug Stull • Edited

I once was testing sudoers, and happened to name the account f*ckyou@companydomain.com. Well I made a mistake in the sudoers setup and caused it to send the traditional: f*ckyou@companydomain.com is not authorized to the support email address that had about 60 ppl on it ranging from business to unix admins...

Collapse
 
ben profile image
Ben Halpern

This is pretty great

Collapse
 
pgoodjohn profile image
Pietro Bongiovanni

Had a similar one. I had just rewritten a bunch of cron jobs that send time sensitive emails.

There was one cron job that was supposed to run every minute and check the status of a bunch of records that where created 20 minutes before.

When I was testing this little script I changed the time span where the record needed to be created from 1 minute to something like three days and I forgot to change it back before deploying.

When we deployed the change everything went fine because the script wasn't supposed to run till the following morning at around 9am.

Well the morning after I got to the office and everybody was going around crazy because some customers had already received ~30 mails each. I ran to my computer and started killing the container that was sending the emails before going through the source code and finding where the problem was.

I sent around 10k emails in 10 minutes.

Never have I ever wrote a cron job without a unit test after that.

Collapse
 
shalvah profile image
Shalvah

Wait...cron jobs have unit tests?

Thread Thread
 
sehetw profile image
sehe

Yours don't?

Collapse
 
jake profile image
Jake Casto

I was messing with a (dumb) co worker. I told him that β€˜rm -rF /β€˜ would fix all issues in his code. As a joke I added it at to the top of the loader for our web app... I accidentally pushed it to a production server. Long story short I lost my job and had to rewrite everything.

Collapse
 
codemouse92 profile image
Jason C. McDonald

Yikes! This is exactly why we tell people not to make light of death commands. :P You make that mistake once, and never again!

Collapse
 
alephnaught2tog profile image
Max Cerrina

Such a benign version, but I still remember how super upset I was as a kid and someone told me alt+f4 would refresh the page or something.

Thread Thread
 
rogerogeroger profile image
πš“πš˜πš•πš•πš’ | πš πš’πš•πšŒπš˜

"there's this really cool Gameboy cheat code for Tetris, if you press 'Select, Start, B, and A' all at the same time" is a really cool/cruel joke kids of the early 90s would tell each other ...

Thread Thread
 
rogerogeroger profile image
πš“πš˜πš•πš•πš’ | πš πš’πš•πšŒπš˜

more related to your benign example (and even more benign): when i was first really tinkering with Linux back in like 2001/2002, i couldn't figure out how to get out of this program i was using (you'd think it would be Vi, but it was probably just a Man page) and a co-worker suggested Alt-F4-- while that did technically get me unstuck from my issue, it took me another day or so to figure out what i'd actually done

Thread Thread
 
codemouse92 profile image
Jason C. McDonald

Let me guess - erases scores?

Thread Thread
 
rogerogeroger profile image
πš“πš˜πš•πš•πš’ | πš πš’πš•πšŒπš˜

it reboots the device!

Thread Thread
 
moopet profile image
Ben Sinclair

On IRC it'd be "use /quit bugging me " to ignore someone.

Collapse
 
terceranexus6 profile image
Paula

Damn, this is hardcore

Collapse
 
andy profile image
Andy Zhao (he/him)

Oh man... My heart really sunk for you...

Collapse
 
jake profile image
Jake Casto

It did fix all code issues though :)

Collapse
 
ben profile image
Ben Halpern

Are you being serious? That is remarkable.

Collapse
 
jake profile image
Jake Casto

I’m dead serious.

Thread Thread
 
rogerogeroger profile image
πš“πš˜πš•πš•πš’ | πš πš’πš•πšŒπš˜

that's the Chicago Fire method of fixing things-- burn it all down, and start over

Collapse
 
pgoodjohn profile image
Pietro Bongiovanni

I once ran rm -rf / in a virtual machine just for fun to see what it would happen.

The VM had some shared disks with my actual OS and it started deleting everything that I had on my desktop and some other important folders.

I was long due for a backup so I ended up losing a crap ton of data, but now I will never forget to check for shared disks when i'm working in a VM.

Collapse
 
mpeyper profile image
Michael Peyper

I think you win this round or "who's made the biggest mistake"

Collapse
 
phlash profile image
Phil Ashby

To your credit you identified a number of workflow issues doing that:

  • peer review anyone?
  • the code only /exists/ on production (where's the source control)?
  • firing you /before/ asking you to re-write seems pretty short sighted..

I hope you've got a job somewhere more thoughtful now :)

Collapse
 
jake profile image
Jake Casto

Thank you for trying to credit me, a few issues there haha. I was the primary reviewer, I had worked on everything so long that no one really bothered reviewing my code. We had two git repos, one development & one production. I pushed the commit to the production repo instead of the development one haha.

Collapse
 
dfellini profile image
Dan Fellini

Epic. I love it.

Collapse
 
codemouse92 profile image
Jason C. McDonald

Many years ago, I tried to use some deeply nested-for loops in ActionScript 3.0 code I wrote to find the right combination of values in an XML template to match the hash of a given file. (The goal was to determine what exactly had changed in the file.)

About 20 minutes into waiting for it to finish, I decided to figure out how long it was going to take.

I determined mathematically it would finish on June 18, 2598 at 8:42 AM.

I learned two things that day:

1) One-way hashes are called that for a reason.

2) Deeply nested for loops are the work of the Devil anyway.

Collapse
 
ben profile image
Ben Halpern

Well how's the script coming along now? Any celebrations planned for 2598?

Collapse
 
codemouse92 profile image
Jason C. McDonald

I don't have that much patience, so I just aborted it instead. :P

Collapse
 
blouzada profile image
Bruno Louzada

I discovered Insert with multiple rows, tried insert 250k lines each line a insert command in a remote database it will take 5 days hahaha. With one insert and 250k rows it took < 5 mins.

Collapse
 
dmfay profile image
Dian Fay

I stopped the line at a major North American steel producer for the better part of a day.

Collapse
 
marcellothearcane profile image
marcellothearcane

But how?

Collapse
 
dmfay profile image
Dian Fay

It's been over ten years and I don't remember the exact details. It probably had something to do with the fact that I was very much still learning basic SQL (at a co-op/internship kind of thing) and the company didn't have any kind of process controls for pushing to production until very shortly afterward.

Thread Thread
 
ajorona profile image
Anthony Orona

Haha that reminds me of my mess-up at my internship. I was working on my first big feature and iteratively pushing code into the master branch as I learned. My code was tagged for release and deployed to production with bugs like links not working properly etc. Not that big of a deal but really embarrassing for me.

Collapse
 
vaidehijoshi profile image
Vaidehi Joshi

I wrote a database migration to add a column and was as happy as a clam! I deployed it to production and thought everything would be fine.

Five minutes later I realize that my migration also involved updating every single user record and the database is locked and no one can access prod and OOPS πŸ™ˆ

Collapse
 
vaidehijoshi profile image
Vaidehi Joshi

PS. You all did a great job handling the downtime ❀️

Collapse
 
ben profile image
Ben Halpern

Collapse
 
shalvah profile image
Shalvah

Yasss! I did this too. πŸ˜‚πŸ˜‚

Collapse
 
evanplaice profile image
Evan Plaice

Network communication bug on a custom ping-pong handshake protocol caused by stale state in an array overflow.

For a little perspective. I was writing a graphical multi-touchscreen front-end for a 80's era commercial flight simulator. The sim host ran a early 80s era realtime AIX environment. Booting the damn thing took about 15 minutes and involved following a 'bootstrap loader' procedure whereby you'd have to fat-finger the boot instructions into a keypad in hexadecimal. Ethernet networking didn't exist when it was built so we had to have an ethernet card custom built. On the software end we were working with an AIX specialist from the UK who designed the protocol, implemented a bare bones ethernet driver, and provided a client/server implementation on the host end.

Our AIX guy had done this before on other similar machines but it was my first journey into networking programming. I wouldn't have physical access to the host until integration so the AIX guy gave me a mini client/server simulator app written in C that I could train my client/server implementation against.

My end was entirely written in C#, where trying to do layer-2 networking is difficult enough. Either way, I found a PCAP wrapper (ie SharpPcap)to hack together a networking protocol, translated a floating point format converter (Ie the host didn't use IEEE 754) from C, reverse engineered a raw dump of the symbol table and loaded it into a DB, etc...

Integration finally came and I was a nervous wreck. We had 7 full days of downtime to make the changes, including ripping out the old hardware (ie ray-tracing displays + 100+ buttons) and replacing it with the new (3 touch screens). Flight simulators are the backbone of an airline, pilots who can't keep up with the FAA requirements for required training are grounded until they can. Therefore, flight simulators typically operate 20 hours/day, 7 days/week. Downtime beyond our scheduled window was not an option.

Months of preparation and death-march sprinting was finally yielding results. Everything was working brilliantly to the point where we could even start identifying opportunities to make adjustments and performance improvements.

That was, until somebody noticed something strange happening on the interface. Instead of updating to show the current state of the host as expected, a small select number of labels were constantly toggling between the correct value and something else. The symptom only occurred on a few pages, and only when the pages were loaded in a specific order.

I racked my brain for hours chugging coffee and pouring over every detail of my networking code until at about 4AM. That's when I finally managed to catch the AIX specialist taking a break from his VT220 terminal. I picked his brain, going back over the the specifics of the networking protocol spec. When that failed yield results, I started picking his brain about how he implemented the protocol on the host. That's when I had a sudden 'lightbulb' moment.

It turns out that, on 80's era hardware running under real-time constraints, re-initializing the state array on his end for every update (ie roughly every couple 100ms) is a very expensive operation. To avoid the performance penalty, he initially allocated the array to a fixed maximum size, and wrote over the existing values with the new values on each update.

On my end, running C# on modern Windows hardware, array initialization is the cheap and 'safe' approach so that's exactly what I did. Since, I wasn't sending fixed-length set large enough to blank out values outside of the new set, it was possible for stale values to persist in the overflow of the array.

If the old label existed in the overflow (ie which wasn't set to update), and the same label was present in the new set of values (ie which was set to update), the UI the value would read either the old or new at random.

This would only occur in under very specific conditions. The set of current labels had to be shorter than the previous set of labels. The same label had to be present in the current set and in the overflow. The overflow would persist as long as the size of a new set of labels was shorter than the set of labels the old label was contained in.

Through sheer luck, I managed to get the code patched and working before I left for the hotel. Up until that point, I had worked mostly on the hardware side. I even planned and installed all the hardware for the update before switching back to code.

It's not like this was my first 'aha' troubleshooting moment but it was the first time I had my first full Boris-esque 'I am Invincible!' moment. TBH, I've been hooked ever since.

Collapse
 
alephnaught2tog profile image
Max Cerrina

Honest to god, reading that was a hell of a rush. Write more! That was amazing.

Collapse
 
foresthoffman profile image
Forest Hoffman

There were a few that occurred in development, so no production environments were affected (that I can remember). However, I did once mistype a sudo rm -rf command...

I intended to type ./ for the current directory, but failed to enter the period. So, it was after I hit enter that I realized I had run it with a / in front of everything. I had several backups, so I only lost a bit of work.

I still gave myself a minor heart attack.

Collapse
 
hrmny profile image
Leah

That's why you just write . so you can't accidentally delete everything πŸ‘€

Collapse
 
rubberduck profile image
Christopher McClellan

How about β€œI’ve done this too.”?

Fixed a bug in our build system. (Commits direct on a branch instead of merging weren’t building.)
Our build system relied on itself to perform its own build.
Yup. Around 9000 deploys before we managed to shut it down and pin the build to an older version of itself...

Collapse
 
mohannad profile image
Mohannad Najjar

Two months ago I was student and also part-time developer for the main student's services web-app in the university.

I deployed some updates and left the office for my lectures, 4 hours later I noticed the students talking to each other about something so messed up and crazy on the web-app.

I got back to the office, and realized that I accidentally changed the production database connection to the testing database.

Collapse
 
coderangshuman profile image
Angshuman Halder

So I wrote a bug where when you try to update some data from the main server db to your local server the data gets deleted from the main server and it gets updated in local server's db.

Collapse
 
phlash profile image
Phil Ashby

Seems like a perfectly reasonable distributed storage/sharding solution... :)

Collapse
 
dbalthaz profile image
Drew

Oh man, I have a good one for this.

Full disclosure, I'm actually an FPGA engineer, rather than a software engineer. VHDL and verilog my be hardware description languages, but they are close enough to code that I get good ideas from here.

My goofs usually result in damaging components, usually pretty expensive.

This is one of those stories.

I work for a medical device company, and my main project for the last 9 months or so has been subsystem which precisely controls 100+ custom, purpose-built solenoids. Each solenoid is individually controlled by a PWM signal, and ours generally take an approx. 30% duty cycle (meaning the signal spends 30% of the time interval on, 70% off). You can give it 100% duty cycle signal if it lasts less than a minute. We take advantage of this to move them faster, for very short periods of time (approx. 1 millisecond) The metaphor for this is like the gas pedal on your car. You generally don't floor it, but you can for brief periods of time when you need to.

Due to some bad luck and some faulty reset logic I wrote, each solenoid started receiving a PWM signal that was ~15 minutes on, ~15 minutes off approx. 15 minutes after power up. To return to the car metaphor, this would like flooring it until your engine explodes. It toasted every single solenoid in the damn thing. In addition, this happened after the first time we installed it in the larger system, which is a ~1 hour process involving 3 people and a forklift, plus downtime on the larger machine.

Oops.

Collapse
 
adammckerlie profile image
Adam McKerlie

Not technically a bug but I was once testing SQL injection on our dev site and dropped our database DROP DATABASE dev;. It worked, I patched the form and then started receiving emails saying the site was down.

I wasn't on dev, and our production database was called dev.

Collapse
 
theminshew profile image
Michael Minshew

Why on earth would anyone name their prod env dev..... ???!?!?!?

Collapse
 
adammckerlie profile image
Adam McKerlie

My guess is after initially developing the app, instead of wanting to change the variable name they just recreated the DB in prod.

Best part, dev was called dev_dev

Thread Thread
 
maccabee profile image
Maccabee

I feel like dev and prod DBs shouldn't even be on the same machine.

Thread Thread
 
adammckerlie profile image
Adam McKerlie

They weren't. I was sql injecting a PHP app and accidentally did it on prod instead of dev.

Collapse
 
blouzada profile image
Bruno Louzada

LOL

Collapse
 
paceaux profile image
Paceaux

Worst thing I ever did was before I was a developer. I was trying to debug an issue with a product line in our content management system. We had Canadian products, US products, and then a small set of US products sold in Canada. That small set of US-in-Canada products weren't displaying images. So, I republished something from the CMS, and fixed it!

2 minutes later, every phone in the department was ringing, as the entire US product line was down. It was a Friday, and the busiest day of the month for traffic. On the one hand, I could argue that it wasn't my fault. Apparently there was a bug in how products were connected, and I'd just uncovered it. On the other hand, no one cares, the entire US product line is gone.

So, I quickly unpublished a thing, and the US product line was back. At which time, I went in to the director's office, confessed what I had done, and why it'd happened. (CMS had no archiving, and no staging environment, theoretically I could've gotten off scott-free). The Director said, "everyone gets one , this was yours"

As a developer... I crashed codewars.com. I mistakenly wrote some infinite, recursive loop in some code challenge. The browser got slow and unresponsive. Then the page froze up. I went to check the health of codewars.com (they actually had a health-check page somewhere), and it was reporting slow. then, it... was down.

Collapse
 
shalvah profile image
Shalvah

You crashed a site from your desk. Badass! πŸ™ŒπŸ™ŒπŸ˜‚πŸ˜‚