So I just wrote my worst bug ever.
Any stories to make me feel better?
So I just wrote my worst bug ever.
Any stories to make me feel better?
For further actions, you may consider blocking this person and/or reporting abuse
𒎏Wii 🏳️⚧️ -
Kathryn Grayson Nanz -
thinkThroo -
igbojionu -
Top comments (99)
Already told this story as a tweet responding to the above, but longer stories are always fun.
I set up a RPI Zero with a webcam to monitor my front door for potential visiters. I wanted to have it text me any time someone was at the door. Originally I tried to use the text-email gateway for my provider, but sending emails from the RPI just ended up looking like spam, and I didn't want to bother setting up some email relay, so I looked into using Twilio. Thankfully they have a nice little script you can use to text a given number, you just have to supply your API key and get a Twilio number to text from. The software I was using,
motion
, also has a bunch of nifty hooks, including one for when it detects an "event" of motion, the definition of which you can tune. Awesome! So, plug in the script to that, restartmotion
, and start waving my hand in front of the camera!I then proceeded to get about 500 texts in the span of about a minute.
Eventually, after shutting everything down, I figured out what I did wrong. I had put the script into the
on_motion
hook, instead of theon_motion_event
hook. The former hook is if you want to trigger something any time a frame of motion is detected. Wave your hand in front of that a bunch, and you're bound to trigger that hook many times. So not only did I manage to flood my phone with texts, but I also burnt through a good amount of Twilio trial credits.I once was testing sudoers, and happened to name the account f*ckyou@companydomain.com. Well I made a mistake in the sudoers setup and caused it to send the traditional: f*ckyou@companydomain.com is not authorized to the support email address that had about 60 ppl on it ranging from business to unix admins...
This is pretty great
Had a similar one. I had just rewritten a bunch of cron jobs that send time sensitive emails.
There was one cron job that was supposed to run every minute and check the status of a bunch of records that where created 20 minutes before.
When I was testing this little script I changed the time span where the record needed to be created from 1 minute to something like three days and I forgot to change it back before deploying.
When we deployed the change everything went fine because the script wasn't supposed to run till the following morning at around 9am.
Well the morning after I got to the office and everybody was going around crazy because some customers had already received ~30 mails each. I ran to my computer and started killing the container that was sending the emails before going through the source code and finding where the problem was.
I sent around 10k emails in 10 minutes.
Never have I ever wrote a cron job without a unit test after that.
Wait...cron jobs have unit tests?
Yours don't?
I was messing with a (dumb) co worker. I told him that ‘rm -rF /‘ would fix all issues in his code. As a joke I added it at to the top of the loader for our web app... I accidentally pushed it to a production server. Long story short I lost my job and had to rewrite everything.
Yikes! This is exactly why we tell people not to make light of death commands. :P You make that mistake once, and never again!
Such a benign version, but I still remember how super upset I was as a kid and someone told me alt+f4 would refresh the page or something.
"there's this really cool Gameboy cheat code for Tetris, if you press 'Select, Start, B, and A' all at the same time" is a really cool/cruel joke kids of the early 90s would tell each other ...
more related to your benign example (and even more benign): when i was first really tinkering with Linux back in like 2001/2002, i couldn't figure out how to get out of this program i was using (you'd think it would be Vi, but it was probably just a Man page) and a co-worker suggested Alt-F4-- while that did technically get me unstuck from my issue, it took me another day or so to figure out what i'd actually done
Let me guess - erases scores?
it reboots the device!
On IRC it'd be "use /quit bugging me " to ignore someone.
Damn, this is hardcore
Oh man... My heart really sunk for you...
It did fix all code issues though :)
Are you being serious? That is remarkable.
I’m dead serious.
that's the Chicago Fire method of fixing things-- burn it all down, and start over
I once ran
rm -rf /
in a virtual machine just for fun to see what it would happen.The VM had some shared disks with my actual OS and it started deleting everything that I had on my desktop and some other important folders.
I was long due for a backup so I ended up losing a crap ton of data, but now I will never forget to check for shared disks when i'm working in a VM.
I think you win this round or "who's made the biggest mistake"
To your credit you identified a number of workflow issues doing that:
I hope you've got a job somewhere more thoughtful now :)
Thank you for trying to credit me, a few issues there haha. I was the primary reviewer, I had worked on everything so long that no one really bothered reviewing my code. We had two git repos, one development & one production. I pushed the commit to the production repo instead of the development one haha.
Epic. I love it.
Many years ago, I tried to use some deeply nested-for loops in ActionScript 3.0 code I wrote to find the right combination of values in an XML template to match the hash of a given file. (The goal was to determine what exactly had changed in the file.)
About 20 minutes into waiting for it to finish, I decided to figure out how long it was going to take.
I determined mathematically it would finish on June 18, 2598 at 8:42 AM.
I learned two things that day:
1) One-way hashes are called that for a reason.
2) Deeply nested for loops are the work of the Devil anyway.
Well how's the script coming along now? Any celebrations planned for 2598?
I don't have that much patience, so I just aborted it instead. :P
I discovered Insert with multiple rows, tried insert 250k lines each line a insert command in a remote database it will take 5 days hahaha. With one insert and 250k rows it took < 5 mins.
I stopped the line at a major North American steel producer for the better part of a day.
But how?
It's been over ten years and I don't remember the exact details. It probably had something to do with the fact that I was very much still learning basic SQL (at a co-op/internship kind of thing) and the company didn't have any kind of process controls for pushing to production until very shortly afterward.
Haha that reminds me of my mess-up at my internship. I was working on my first big feature and iteratively pushing code into the master branch as I learned. My code was tagged for release and deployed to production with bugs like links not working properly etc. Not that big of a deal but really embarrassing for me.
I wrote a database migration to add a column and was as happy as a clam! I deployed it to production and thought everything would be fine.
Five minutes later I realize that my migration also involved updating every single user record and the database is locked and no one can access prod and OOPS 🙈
PS. You all did a great job handling the downtime ❤️
Yasss! I did this too. 😂😂
Network communication bug on a custom ping-pong handshake protocol caused by stale state in an array overflow.
For a little perspective. I was writing a graphical multi-touchscreen front-end for a 80's era commercial flight simulator. The sim host ran a early 80s era realtime AIX environment. Booting the damn thing took about 15 minutes and involved following a 'bootstrap loader' procedure whereby you'd have to fat-finger the boot instructions into a keypad in hexadecimal. Ethernet networking didn't exist when it was built so we had to have an ethernet card custom built. On the software end we were working with an AIX specialist from the UK who designed the protocol, implemented a bare bones ethernet driver, and provided a client/server implementation on the host end.
Our AIX guy had done this before on other similar machines but it was my first journey into networking programming. I wouldn't have physical access to the host until integration so the AIX guy gave me a mini client/server simulator app written in C that I could train my client/server implementation against.
My end was entirely written in C#, where trying to do layer-2 networking is difficult enough. Either way, I found a PCAP wrapper (ie SharpPcap)to hack together a networking protocol, translated a floating point format converter (Ie the host didn't use IEEE 754) from C, reverse engineered a raw dump of the symbol table and loaded it into a DB, etc...
Integration finally came and I was a nervous wreck. We had 7 full days of downtime to make the changes, including ripping out the old hardware (ie ray-tracing displays + 100+ buttons) and replacing it with the new (3 touch screens). Flight simulators are the backbone of an airline, pilots who can't keep up with the FAA requirements for required training are grounded until they can. Therefore, flight simulators typically operate 20 hours/day, 7 days/week. Downtime beyond our scheduled window was not an option.
Months of preparation and death-march sprinting was finally yielding results. Everything was working brilliantly to the point where we could even start identifying opportunities to make adjustments and performance improvements.
That was, until somebody noticed something strange happening on the interface. Instead of updating to show the current state of the host as expected, a small select number of labels were constantly toggling between the correct value and something else. The symptom only occurred on a few pages, and only when the pages were loaded in a specific order.
I racked my brain for hours chugging coffee and pouring over every detail of my networking code until at about 4AM. That's when I finally managed to catch the AIX specialist taking a break from his VT220 terminal. I picked his brain, going back over the the specifics of the networking protocol spec. When that failed yield results, I started picking his brain about how he implemented the protocol on the host. That's when I had a sudden 'lightbulb' moment.
It turns out that, on 80's era hardware running under real-time constraints, re-initializing the state array on his end for every update (ie roughly every couple 100ms) is a very expensive operation. To avoid the performance penalty, he initially allocated the array to a fixed maximum size, and wrote over the existing values with the new values on each update.
On my end, running C# on modern Windows hardware, array initialization is the cheap and 'safe' approach so that's exactly what I did. Since, I wasn't sending fixed-length set large enough to blank out values outside of the new set, it was possible for stale values to persist in the overflow of the array.
If the old label existed in the overflow (ie which wasn't set to update), and the same label was present in the new set of values (ie which was set to update), the UI the value would read either the old or new at random.
This would only occur in under very specific conditions. The set of current labels had to be shorter than the previous set of labels. The same label had to be present in the current set and in the overflow. The overflow would persist as long as the size of a new set of labels was shorter than the set of labels the old label was contained in.
Through sheer luck, I managed to get the code patched and working before I left for the hotel. Up until that point, I had worked mostly on the hardware side. I even planned and installed all the hardware for the update before switching back to code.
It's not like this was my first 'aha' troubleshooting moment but it was the first time I had my first full Boris-esque 'I am Invincible!' moment. TBH, I've been hooked ever since.
Honest to god, reading that was a hell of a rush. Write more! That was amazing.
There were a few that occurred in development, so no production environments were affected (that I can remember). However, I did once mistype a
sudo rm -rf
command...I intended to type
./
for the current directory, but failed to enter the period. So, it was after I hit enter that I realized I had run it with a/
in front of everything. I had several backups, so I only lost a bit of work.I still gave myself a minor heart attack.
That's why you just write
.
so you can't accidentally delete everything 👀How about “I’ve done this too.”?
Fixed a bug in our build system. (Commits direct on a branch instead of merging weren’t building.)
Our build system relied on itself to perform its own build.
Yup. Around 9000 deploys before we managed to shut it down and pin the build to an older version of itself...