DEV Community

Tell me a bug story

Ben Halpern on June 05, 2019

Bugs are inevitable; Debugging is painful, but the experiences make us better developers.

So let's hear it! Tell us about some of the bugs you've encountered, and how you dealt with it.

Collapse
 
gosukiwi_20 profile image
Federico Ramirez

I had a bug and tried to fix it using threads. Nnoww II hhaavvee ttwwoo bbuugss.

Collapse
 
briwa profile image
briwa

Haha yup, just like what they said when solving a problem with Regex, now you have two problems...

Collapse
 
pinotattari profile image
Riccardo Bernardini

Regex are great! ☺ Seriously, I think I am one of the few that really like them.

I agree that as they get a bit complex, some regex look like line noise (I always try to find a less obscure syntax), but for searching and extracting text are really powerful.

Collapse
 
philnash profile image
Phil Nash

I used to work on a site that you could log into with your Instagram account, pick a bunch of pictures and buy them as fridge magnets.

This worked mostly well, but for the occasional pack where some or all of the images would fail to download. We spent ages working with the code that downloaded the image, trying to find where the bug was. (I feel more confident with downloading images in Ruby now!)

Ultimately, we decided the code that was downloading the images wasn't the issue, so perhaps it was the Instagram API? Or a flaky connection from our server?

More investigation lead to the discovery that on occasion a user would just delete their picture from Instagram, leading to our failed download.

So, we moved the downloading from on demand when the print job was run to a background worker once the user made their purchase. Jobs would still occasionally fail.

We moved the job to before the user even completed their purchase. This helped, but jobs would still occasionally fail.

I'm not even sure you could call it a bug in the end. Some users were uploading pictures to Instagram just to get them printed and then deleting them immediately. It didn't matter how many workers we ran against the job queue, there was always a user that was faster at deleting their images. The eventual fix was the loosening of the Instagram restriction, instead allowing users to use Facebook photos or upload photos from their computer/phone. When users no longer had to use only Instagram to get their images on to our site things became better. This was more work for us (Instagram photos were just square at the time, which fit the magnets, opening up to non-square photos meant we needed an image cropper and just a lot more UI) but was better for the user.

Am I calling users bugs here? Of course not! But understanding the ways that user actions can affect the way your site works is just as important as an esoteric language exception. And if something is failing, there are more ways to fix it than just inspecting the code.

Collapse
 
kamranayub profile image
Kamran Ayub

Recently we had a performance issue on a web app I was working on. We showed a series of questions where one question's answer may lead to a few new ones showing up.

The issue was that after answering the first question, answering the second question took 2 MINUTES to render the next set of questions. The UI was blocked the whole time.

I started to dig in. Our app let users manage 50 different products at once so this issue only started to manifest when you managed a ton of products. I also counted that the second question triggered 13 additional questions to show up.

When I started to log the number of Redux actions being dispatched, I was amazed to see we were dispatching (and re-rendering) 6000+ actions which is what was causing the slowdown. Each action dispatch was about 20ms (x 6000 =~ 2mins).

The questions had a graph structure where questions could relate to one another. Turns out we were following this graph structure and dispatching actions even if no actual data was changing, so I updated the logic to compare previous values; that cut it down to about 650 actions (50 products x 13 questions) which is what it took to make the new questions visible.

This reduced the time from 2 minutes to 25 seconds. Because the rest of the actions were technically needed to change state, I introduced the redux-batched-actions package to batch all those actions and dispatch one single action. Doing so reduced the time down to about 2 seconds. Much better!

Eventually, what I discovered was adding significant time was the JS ... spread operator. What! Turns out because we needed to support IE11, the spread operator was being polyfilled and this implementation was slow as heck. We switched some critical code to using assign instead and it reduced one functions total execution time from 1s to 12ms.

Overall, I got it down from 2 minutes to <100ms by doing these optimizations and simplifying the complexity of some functions to faster O(1) and O(N) implementations.

What a doozy! That took a good week to work through but the improvements were to core code in the app so the entire app benefited from it!

Collapse
 
philnash profile image
Phil Nash

Wow! That is quite the performance improvement. If only redux warned if that many actions were dispatched, it feels to me like 6000 actions for one request is unlikely to ever be on purpose!

Nice job tracking it all down though. 2 minutes to under 100ms is amazing!

Collapse
 
jacoby profile image
Dave Jacoby

We are a lab in a school that does science for other labs. We had a page that had been working fine for most people for a while, but we got a report that a person trying to use this page and the links wouldn't work, which means that user couldn't do her work.

I take a look, click links, and everything looks fine. And then I look at the logs, and by process of elimination, I figure out that this user is a Mac user using Safari.

I take a look at the source, and ...

We all know that URIs are (protocol)(server)(path), and this page would be (https://)(dev.to)(/ben/tell-me-a-bug-story-59e2), and you can do "absolute" links with just the (path), but did you know that you can include URLs that are protocol and path without server?

No, you didn't. Because that in an abomination before Tim.

But someone did. Someone, or a significant number of someones, did this, enough to make it so IE accepted it. And FireFox. And Chrome. But Safari didn't.

This was hard for me to test, because I have almost no Apple in my life.

So, to clarify: The bug isn't that we use this abomination (although that's what I fixed) but that it is industry standard for browsers to accept this, and Safari didn't. Be liberal in what you accept, I guess.

So, we:

  1. changed our code so we didn't continue in sin
  2. I created an Apple Dev account so I had access to the issue tracker, and when I tried to submit this bug, the bug tracker crashed. :shrug:
Collapse
 
jackharner profile image
Jack Harner πŸš€

I created an Apple Dev account so I had access to the issue tracker, and when I tried to submit this bug, the bug tracker crashed. :shrug:

That's the best part of this one πŸ˜‚

I definitely feel your pain on debugging Apple issues. I have access to an iPod Touch at work, but that's about it. Debugging on that without a Mac to connect it to is basically just trial and error.

Collapse
 
ben profile image
Ben Halpern

I’ve never been heavily involved in the Apple ecosystem but I’ve been around enough to feel this pain 😭

Collapse
 
kenbellows profile image
Ken Bellows • Edited

Several years ago, I was working on a python application. Tbh I don't remember much of anything about the actual program, but it's not important. I started seeing some weeeeiiirrdd behaviors in a particular function after the first time it was called. Let's pretend this was the function:

def add_score(obj={'type': 'default', 'score': 0}):
    obj['score'] += 10
    return obj

The code calling the function in question often called it without supplying an argument, and expected to get back the same result every time:

add_score() #=> {'type': 'default', 'score': 10}

But what was actually happening was much stranger:

agent_1 = add_score()
agent_1['type'] = 'red'

agent_2 = add_score()
agent_2['type'] = 'blue'

agent_3 = add_score()
agent_3['type'] = 'orange'

print(agent_1)
print(agent_2)
print(agent_3)

What would you expect to see? I expected this:

{'type': 'red', 'score': 10}
{'type': 'blue', 'score': 10}
{'type': 'orange', 'score': 10}

But what I actually got was this:

{'type': 'orange', 'score': 30}
{'type': 'orange', 'score': 30}
{'type': 'orange', 'score': 30}

β‰β‰πŸ˜΅β‰β‰

It took me SO LONG to understand the problem. The problem is that the default object in the function signature, the {'type': 'default', 'score': 0} object, is parsed and defined at function definition time, and it exists in the scope surrounding the function, when I thought it was defined each time you called the function, within the function scope. NOPE! πŸ€¦β€β™‚οΈ

So every time I called the function with no arguments, it was operating on and returning the same object! So all the agent_x variables in the code up there are referring to the same thing!!!

Oh my god the amount of time I wasted on this bug... but on the plus side, the very first article I wrote on dev.to was on this very bug (and how JavaScript's default parameters work the way I had expected Python's to work), so it sorta got me into tech blogging! Thanks bug!

Collapse
 
ben profile image
Ben Halpern

We had a bug in the DEV signin process forever which would randomly show an error message as invalid credentials or something like that as the error message being passed back to the oauth response. It was very uncommon, but never went away.

It turned out it wasn't invalid credentials, but the error message was actually just more of a catch all for "something went wrong". It turned out to be a timeout error. It sometimes took more than ~9 seconds to create an account and the request timed out.

That was a doozy. @andy mostly figured it out.

Collapse
 
mandaputtra profile image
Manda Putra

Oh it is, when my connection slow I always get that Invalid Cred message πŸ˜‚

Collapse
 
andy profile image
Andy Zhao (he/him)

Oh yeah!! That bug was a trip. πŸ˜₯

Collapse
 
vinceramces profile image
Vince Ramces Oliveros

I still experience the same thing when Sign in to GitHub account. I do want to submit an issue, but... It's a feature

Collapse
 
andy profile image
Andy Zhao (he/him)

We have some recently reported bugs about sign in issues, but each case has been a little bit different so far.

If you're having issues with your account, feel free to submit an issue for any bugs or feature requests to the repo: github.com/thepracticaldev/dev.to

We also provide support via our email: yo@dev.to

Collapse
 
razer profile image
Serge Stupachenko • Edited

Sometimes bugs can be fun.

I used to work on a geo-aware mobile app a while ago. We had an offshore QA team located in India.

They reported a bug about inconsistent behavior between iOS and Android apps. The restaurant was showing as open in one app and as closed in another.

We spent about a day trying to figure out what's wrong and about an hour in a conference call with them. Nobody was able to reproduce it again, so it got deferred.

As we figured out later, they run their test at 8:59 PM on one device, when the restaurant was still open. Their second run on another device was done at 9:01 PM when the restaurant was actually closed.

The funny part is the name of the restaurant - "The Blind Pig." πŸ˜…

A few years later, we developed a tool aimed to help distributed teams to inspect and debug mobile apps faster.

Collapse
 
briwa profile image
briwa • Edited

The recent one was about Highcharts. You were supposed to have an animation when hovering the mouse on the legend.

jsfiddle.net/y92haq35 (somehow the fiddle can't be embedded)

It was fine on an isolated environment, but somehow the animation didn't appear on our page. I thought it was a configuration issue, so I copied and pasted the exact same config on our page. It didn't appear too. I was having trouble inspecting the styles because it isn't triggered semantically by CSS, rather by JS.

I inspected the source code, but every hover class was firing up properly. I tried it on a different page in the app, the animation is there. So something was causing it in the original page.

After painstakingly removing the components/modules in that page one by one, seeing which one causing the problem, I found out that there is a line of CSS that goes like this:

/*
* (a note about a bug its trying to fix)
*/
.highcharts-series-hover {
  opacity: 1 !important;
}

Basically this line says all Highchart series would have an opacity of 1, so even if the animation kicks in, this line overrides it (with the !important) so that it looks like there is no animation. Should've fixed the actual bug from the issue tracker...

And that concludes 3 hours of debugging. I think I didn't do a good job debugging it, any suggestions? πŸ˜‚

On another note, how do you all prevent these kind of CSS bugs? Visual regression test? Eye test?

Collapse
 
zerquix18 profile image
I'm Luis! \^-^/

I was using one of MaterializeCSS's date pickers. This one has a prop called minDate which allows you to set what's the minimal date for this selection. This worked wonderfully.

I was passing the variable to Materialize's date picker, which was a date object. This date object had the date and time selected by the user, so Materialize was working properly.

The problem was sending the data to the server. The date variable was somehow changing. It was losing the time. Time was 00:00:00.
I checked my code, from the very beginning, down to my Redux store. Everywhere. My date variable was OK and it HAD the proper time.
I spent hours checking why it was changing. Is Javascript suddenly crazy? Why is this happening to me? Could it be a bug IN javascript?

Turns out Materialize was mutating the date object. The solution was just cloning it minDate = new Date(selectedDate)

That fixed the issue. Lesson learned: be careful with mutation.

Collapse
 
downey profile image
Tim Downey

I work on a Ruby API that serves acts as a core piece of the control plane for an open-source PaaS platform called Cloud Foundry. Our users install and operate the platform on the infrastructure of their choice (on-prem vSphere, AWS, GCP, Azure, etc.) and everyone tends to use it a little bit differently. This leads to lots of possible configurations and makes certain types of bugs hard to triage and even harder to reproduce.

One bug (or unforeseen usage pattern) we had seems really obvious in hindsight, but ended up taking weeks of investigation. We had some users report that their APIs were consuming huge amounts of memory and every six-minutes would reach ~8GB of ram usage and restart. Now Ruby isn't the most lightweight programming language, but it shouldn't be that bad! We initially expected a bad memory leak, but pausing the interpreter and manually forcing garbage collection was able to free up most of the memory. So we ended up crawling through heap dumps (wrote a blog post on this process with my team) and eventually found out there were tons and tons of User model objects in memory.

Turns out that this installation had all of their users (10,000+) belonging to the same organizational unit (called a Space) and we had a frequently-accessed line of code that was loading this full array of users into memory every time an API endpoint was hit. It was simply trying to do an existence check to see if a particular user was a member of the Space in question, but because of how we used our ORM it was instantiating and loading all users within that space into memory. Since our test environments (and many other production environments) tend to only put dozens or hundreds of users in a Space we hadn't encountered this.

The fix ended up being super simple:
Do the existence check in SQL instead of in Ruby

πŸ˜‚

Collapse
 
danielshow profile image
Daniel Shotonwa

I worked with a team and one of my teammates had a bug with an editor package in React, he struggled just to add a placeholder to that editor. He literally tried to solve the bug for over 3 days and even reached out to me for solutions all to no avail. One night I was just thinking about the bug and added placeholder as a prop and it worked.

Collapse
 
javierg profile image
Javier Guerra

My favorite story is one time we where preparing for a big presentation on a new feature, which involved rendering lots of things with handlebars on the browser, everything was working perfect. We showed to the manager, he liked it, and ask us for a rehearsal on the presentation later this day.

During the rehearsal, the page stop working. The browser didn't render a thing. After some debugging, it turn out that the machine where the presentation was happening did an auto Chrome update that bloated the GPU usage, but only with OSX Yosemite. Took us many hours to find this out. But the fix was easy. Use Another machine for presentation, and wait a couple of weeks for another chrome update.

Collapse
 
jacoby profile image
Dave Jacoby

Batch processing is for when the job can take days to weeks. Our system uses Torque, which redirects STDOUT to (task name).o(process id) and STDERR to (task name).e(process id).

In Torque, you can use -n to pass that name, which allows you to send data to the shell script that does everything, like the project id that you're about to package into a 5TB tarball.

So, that happened, and I wanted to see the output. code 1234.o123456789000 1234.e123456789000

This gave me me a VSCode tab for 1234.o123456789000 and an empty tab titled "Infinity".

Code is Electron which is JS and CSS, and you give JS a number bigger than it can handle, it says Infinity. And Code uses Minimist to handle ARGV and didn't specify that the standard entry is a string. I get it, because your standard suffixes will trip the number detection tools.

But 1234.e123456789000 can be read as 1234.0 * 10 ** 123456789000, and that's a really big number.

Fixed by adding ,"_". Bug report and commit available on request.

Collapse
 
tchaflich profile image
Thomas C. Haflich

This one was a bug with a third party vendor. I found the bug for them, and it probably remains unfixed to this day.

We were sending off sheets to be printed, and these sheets included randomly generated unique passcodes of five alphanumeric digits (that is, matching /^[a-z0-9]{5}$/). We had users log into a portal with these passcodes, along with other authentication information. For years, this was the case without issue.

Then one day, I get a support call...

CALLER: Hi, I can't log onto the portal. It says it can't find me?
ME: Hold on one second, let me look you up.
  [I get the CALLER's information and search for them in the database.]
ME: Okay. It looks like your information is all correct on my end. 
ME: Are you seeing any error messages on the screen?
  [We go through the standard debugging steps. You know the drill.]
ME: Can you read me the code on your printout?
CALLER: One, two, three, zero, zero, zero, zero, zero...
ME: Sorry, the five digit code on your printout. 
ME: It should be under the heading "Passcode," in green.
CALLER: Yeah, that's it.
CALLER: It looks weird like it's going over the box or something though.

At this point, I have a suspicion. I look up the details in the database again.

+---------+--------+----------------+----------+
| fake_id |   name | account_number | passcode |
+---------+--------+----------------+----------+
|    9001 | CALLER | ASDF1234FOOBAR |    123E4 |
+---------+--------+----------------+----------+

Some of you may have spotted the issue already. For those who haven't, let's zoom in.

The passcode is listed in our database as 123E4.

ME: I'm very sorry, but can I call you back?

I had to confirm that we sent the passcode out as plain text to the vendor - we in fact did. Whatever process they used to lay out the prints had somehow interpreted our string as exponential notation all on its own.

We couldn't convince them that it was their issue, or tell them how to fix it, so our solution was...

To stop including the letter "e" in our codes Β―\_(ツ)_/Β―

Collapse
 
garrettgreen profile image
Garrett Green

On a small island with a large tree, a colony of ants is preparing food for the arrival of a band of grasshoppers. Of the ants that are working, one of them that stands out is an industrious one named Flik. Flik is constantly inventing new things for the colony to reduce labor, but his ideas are often shouted down and shunned by the colony, who feel the old-fashioned way of preparing the grasshoppers' "offering" is the only way to do things. The only one who seems to believe in Flik is the young Princess of the colony named Dot. Secretly, Flik is attracted to her older sister, Atta, who is next in line for the throne.

As the time for the grasshoppers to arrive approach, the colony heads down into the anthill, intent to wait for the grasshoppers to eat the offering, and leave. Unfortunately, Flik is the last one to put his items on the offering stone, and ends up causing the platform to collapse with his piece of machinery. All the food they've gathered spills into the stream.

When the grasshoppers finally arrive and find nothing, they break into the anthill to terrorize the ants. The leader of the grasshoppers, Hopper, demands that the offering be replenished by the Fall season, and terrorizes Dot, before Flik comes forward to try and defend her. Hopper then commands that the offering be doubled, due to Flik's speaking up against them. During the meeting, Hopper's dimwitted brother, Molt, lets it slip that Hopper is afraid of birds. Hopper silences his brother and the grasshoppers leave, promising to return when the last leaf of Autumn falls.

The colony is now in trouble, as there isn't enough food to fulfill Hopper's request and provide sustenance for the colony. Flik is brought before a tribunal in regards to his causing the trouble. As the group convenes, Flik then thinks up a new idea: if the colony could find bigger bugs to help them defend the colony, they could be instrumental in scaring off the grasshoppers from ever returning. The others think this is a bad idea, until Flik volunteers to look. Thinking that his search will take a long time, the council decides to accept Flik's request, figuring he'll be away from them long enough to keep from causing further trouble.

In another part of the region, a motley crew of bugs are performing in a circus led by a flea, PT Flea. The circus isn't well-attended, most of the audience being flies who jeer and cajole the performers, especially when their acts fail. PT, desperate to keep the flies from leaving, announces a new act call "Flaming Death" where his crew will work together to keep him from being burned. The act fails miserable because the bugs can't coordinate their efforts and PT is burned anyway. He fires them all on the spot.

Flik sets off the next day to head for the big city, eventually finding his way into a 'bug bar,' asking around for tough 'warrior bugs.' His attention is suddenly drawn to a group of bugs in a corner, who it seems are preparing to take on a small gang of flies and their huge member, Thud. They make a valiant effort, posing as medieval knights but their ruse fails and the 'bug bar' ends up being wrecked, and Flik misses much of the fight. However, in the aftermath, Flik thinks he's found the perfect guardians to help his colony.

He pleads his case to the group, saying how he's been looking for bugs with their talent, and asking for their help regarding the incoming group of grasshoppers. The group eagerly accepts, thinking Flik wants them to perform at a dinner theatre and, hoping to avoid trouble from the bar's owner & the flies they fought with, they head off for Ant Island.

When Flik returns, Atta and the elders are shocked that Flik actually found 'warrior bugs.' Atta is at first unsure, but the ladybug of the group. Francis (Dennis Leary), promises that they will knock the grasshoppers 'dead' when they come.

A party is then held in honor of the group, including a tribute and art showing the warriors fighting off the grasshoppers. The group then grows leery, realizing they're meant to fight a war for the ants instead of merely performing. Rosie whispers to Flik that they're actually just circus bugs and Flik is horrified, accusing the group of tricking them. When Atta appears, Flik convinces her that the circus bugs will fight for them.

Discovering their mutual misunderstanding, the circus bugs attempt to leave, but are forced back by a bird. They work together to save Princess Dot, the Queen's daughter and Atta's sister, from the bird as they flee, gaining the ants' trust in the process. They continue the ruse of being "warriors" so the troupe can continue to enjoy the attention and hospitality of the ants. The bird encounter inspires Flik into creating an artificial bird to scare away Hopper, leader of the grasshoppers. The bird is constructed from sticks and leaves, but the circus bugs are exposed by their former ringmaster, P.T. Flea, when he arrives searching for them. Angered at Flik's deception, the ants exile him and desperately try to pull together enough food for a new offering to the grasshoppers, but fail to do so.

When the grasshoppers discover a meager offering upon their arrival, they take control of the entire colony and begin eating the ants' winter store of food. After overhearing Hopper's plan to kill the queen, Dot leaves in search of Flik and the warrior bugs and convinces them to return and save the colony with his original plan. The plan nearly works, but P.T. Flea lights the artificial bird on fire, causing it to crash and be revealed as a fake. Hopper has Flik beaten by his thug, Thumper, in retaliation, but Flik defies Hopper and inspires the entire colony and the warriors to stand up to the grasshoppers and drive them out of the colony.

Before Hopper can be disposed of, it begins to rain where the drops of water are like large bombs. In the chaos, Hopper viciously pursues Flik, who leads him to the actual bird's nest. Hopper mistakes the real bird for another fake bird, and taunts it, attracting its attention. The grasshopper is eaten by the bird and its chicks.

Some time later, Flik has been welcomed back to the colony, and he and Atta are now a couple. As the troupe departs with the last grasshopper, Molt, as an employee, Atta is crowned the new Queen, while Dot gets the princess' crown. The circus troupe then departs as Flik, Atta and Dot watch and wave farewell in a tree branch.

Collapse
 
prsanjay profile image
Sanjay Prajapati

I was working on the issue occurred while loading the General Settings Page in E-commerce store built with Spree 3.7 + Rails 5.2.3 + Ruby 2.6

When I was trying to open the General Settings Page it never opens and browser shows its loading. I checked the back-end log and found that it throws the error: FrozenError - can't modify frozen fatal and nothing else.

After googling the error I found that this FrozenError class has been introduced since Ruby 2.5.0. And it swallows the actually errors.

To debug more precisely I have downgraded ruby to 2.4.6 and after this I found that the issue is with wicked_pdf gem. There is a render method redefined in it and it goes into infinite loop. Which means the render method conflicts with other gems. Here is the details : github.com/mileszs/wicked_pdf/pull...

To resolve the issue, I have downgraded the wicked_pdf version to tag 1.1.0 from latest one.

Collapse
 
bradtaniguchi profile image
Brad • Edited

I had one bug I'll never forget that I experienced after only a few months into my first dev job. It taught me a few very important lesson.

I was writing some AngularJs code a few years ago. I found myself having issues with some errors being introduced in a line of code I added. AngularJs uses the ng-model directive to get/set the value from an input to a value in the controller, but the error kept complaining about the directive not existing.
Now matter what I did, the error didn't go away.

  • I assumed it was my dev-server caching. It wasn't.
  • I assumed it was some browser caching. It wasn't.
  • I assumed it was my editor caching something. It wasn't.
  • I assumed my laptop was getting hacked. It wasn't.
  • I assumed even ng-model was defective in the current situation, or broken. It wasn't.

By this time I had spent half the day on the issue, and was getting pretty frustrated. I eventually started writing a Stack Overflow post with my code. I assumed I met some edge-case only a highly experienced developer would know, or it was a bug with the framework and I hit the jackpot.

While writing my post I miss-typed the word ng-model as ng-modal. A massive tidal wave of inspiration hit. I struggled to contain myself as I searched for the original code and sure enough I found the ng-modal directive instead of ng-model

I nearly cried as my single letter code change finally fixed the problem.

I realized I needed to use some better tooling, as it would of saved me an entire day of work. I also realized I should of asked for help sooner as just reviewing the problem provided me solution. Finally, I realized its better to understand the issue than it is to fix it. I spent half the day trying to fix the issue instead of carefully reading the error provided which stated very clearly: ng-modal instead of ng-model.

Happy bug busting!

Collapse
 
lexlohr profile image
Alex Lohr

I once wrote a limited JS parser to remove comments and stitch different JS files together into one bundle in PHP (that was before grunt, gulp and webpack or even node.js or AST-based JS transpilers. Yes, I'm that old). Somehow, it broke the code of one of our applications and I had a hard time figuring out where this happened.

First, I made sure that nothing but comments had been removed, but everything was correct on this account. Then, I had a closer look at the bug.

TypeError: undefined is not a function

So I searched the whole code for calls of variables that could in some circumstances not contain a function. Again, without any result. After long hours of searching fruitlessly, I had a coffee and try to look at the problem from a new perspective and suddenly, like clouds indiscernibly shifting forms, I saw the issue.

In two of the files, we had encapsulated our code in an IEFE (immediately executed function expression) not to leak variables into the global scope and forgot a semicolon after the first one. The resulting code went like that:

(function() { ... })()
(function() { ... })();

Do you see it, too? The opening parentheses of the second IEFE were interpreted as a call of the assumed function returned by the first IEFE (which of course returned undefined). I introduced a small filter into my parser that made sure that any IEFE would be terminated with a semicolon so that this error should never ever happen again.

Collapse
 
picocreator profile image
Eugene Cheah • Edited

One word: Cache

We have cache on browser, cloudflare cdn, instance memory, shared key-value cache, the deployment package (it has its own cache, as we can have a 100 nodes pulling it for updates at 1 time), and finally the DB itself.

Having any piece of data in between serving the wrong "outdated" data, is always an exercise of buggy frustration, as you can be stuck wondering why the data is outdated. Despite deploying the latest build.

Many variants of this story, and it's always the "cache".

The solution is to clear it, the question is which one - and how.

Collapse
 
darksmile92 profile image
Robin Kretzschmar

The most recent example was the 10pm idea to "Just upgrade react and rewrite all components to arrow functions and use hooks".

Worst.Idea.Ever.

I upgraded react, rewrote all components to be arrow functions and use hooks (useState, useRef, useEffect), aswell as property destructuring (const Comp = ({ classes, children, fancyProp }) => instead of just const Comp = props =>).

This took me about 1 hour and I just wanted to run it to check everything is behaving the same before pushing the changes.
Oh boy, I was so sure I'm gonna make it to bed on time!

Starting with wrong dependencies after the react upgrade, I needed to upgrade other packages too. And this was where the struggle began. Suddenly babel showed me just Not implemented and nothing else.
After searching a while I became aware that I forgot to upgrade the react-router-dom package.
Next try: ... Object({...}) not implemented. πŸ˜’
A good 30 minutes later and with rage pulsating veins, I found out that the @material-ui/core package was upgraded and now there are some changes I need to implement.
Almost 4 hours later, I finally got everything working again and I swore to myself that I would never do something like that so late in the evening πŸ˜‚

Collapse
 
gablaroche profile image
Gabriel Laroche

'Twas six hours before the biggest deployment of the year, I was working peacefully and calmly on a post launch feature on one of the page that was being deployed on that very night. I was testing my feature and I realized, that the most important data was not sent to the last step of a disgusting iframe form. Panic ensued the boss was mad, but glad we caught it.

The lead front-end and I worked on a quick and dirty temporary fix while another front-end worked on a more stable solution that would be required a bit more testing. It all worked out in the end our dirty fix worked and the next we tested and deployed the proper fix.

Collapse
 
jdmedlock profile image
Jim Medlock

I caught a bug just this morning related to GraphQL pagination. The root source is laziness and technical debt on my part. When writing this code I decided to get it working before adding pagination logic, but didn't add a // TODO .... Needless to say I forgot about it until yesterday when I noticed that an incorrect number of objects were being returned.

I'm now correcting and adding a test case. Boy is my face red!

Collapse
 
mandaputtra profile image
Manda Putra

My bug that never never solved. my app has transaction over websocket like 2000 request per sec.

The bug was memory leak, I use socket.io debugging it was so painfull because I cant fake those request on my local machine.

First time I was blaming socket.io because of memory leak, so I created an app again that mimic the same data transaction over websockets. Turns out I cant reproduce the memory leaks issue with socket.io.

For now on I just load balancing the server. But I'm afraid when the transaction goes bigger and bigger maybe I just throw another load balancing server...

Memory leaks issue are the hardest bug I ever encounter

Collapse
 
mercier_remi profile image
RΓ©mi Mercier

Currently in the middle of such a story. Can't tell you the end yet. But the hero hasn't lost courage yet and is getting there!

Will keep y'all posted on how the quest to squash that ugly bug is going.

indiana jones gif

Collapse
 
michelemauro profile image
michelemauro

In 2005, I was working on a B2B website that was expected to produce a .INI file for a portable WinCE terminal (think an intelligent barcode reader).

The output file was correct. But the terminal wouldn't read it.

I checked it visually, line by line, with a working file: the text was the same, after all there was little syntax that could be broken. Opened in a text editor, the two files were identical; diff output was empty.

After a couple of days of headbanging, in a desperate move I opened a working and a non-working file with an hex editor.

An saw, at the beginning of the file: FFFE vs FEFF

Try spotting that difference after 2 days of staring at the screen. That's Big Endian UTF BOM vs Little Endian one. The terminal was a Little Endian ARM, and wanted its text encoded that way. The server was a Big Endian linux on x86, if I remember correctly; an IBM WebSphere, no less.

The solution was a one-liner, adding "UnicodeLittle" in the output stream creation.

Worse time spent/solution size ratio in my carreer, as of now.

Collapse
 
selbekk profile image
selbekk

Not sure if it's been posted already, but the number 1 bug story of the year is this one about a guy that couldn't send his emails longer than 500 miles.

web.mit.edu/jemorris/humor/500-miles

Collapse
 
stevezieglerva profile image
Steve Ziegler

I was managing a really small team (me, 1 dev, 1 tester) and we built a node on our government client's data exchange platform. We traveled to their office for acceptance testing. Everything ran fine locally but only worked the first time when deployed on the server. It drove us crazy and we wasted weeks onsite trying to fix it. I was convinced there was an issue with the other vendor's platform. Finally in frustration, I started reviewing all of the developer's code and saw a variable declared as static. You rarely see those in the wild and questioned him on it. He just thought it might be a good idea to use a static there. It was completely unnecessary and the reason for our run-once issue. He removed it, recompiled and deployed. It worked the 2nd time, and the 3rd, and the 4th ...

Happy Coding!

Collapse
 
starchturrets profile image
starchturrets

Twice I've spent hours trying to debug service workers on my Android tablet (which was frustrating because they seemed to work fine on desktop). Both times it was because I messed up the pathway to the files for caching.

Collapse
 
ma7moudat profile image
Mahmoud Aldaas • Edited

I recently spent half a day trying to figure out why my code was going haywire inside a loop until my eyes saw that I was doing an assignment (=) instead of a comparison (===) in a control statement...

I was so upset with myself I almost cried, like "always use const for this shit, YOU KNOW THIS !Β§$%&/" xD

Collapse
 
hzburki profile image
Haseeb Burki

Not exactly a bug, but the story is worth mentioning. At my first job I was working on a CMS system for a local Med School. It had a lot of features from Attendance, to Exams, to Salary Calculations and even a small management systems for a dental clinic within the school.

The project was in testing phase, it was uploaded to production. Everything was done and the client was using it for the first time. Every morning the first thing I would do is open an email from the client with a list of errors (mostly things the client didn't understand) coupled with screenshots to make it easier to explain.

One day, I open the email to see a screenshot of a table showing all their employees and attendance for the current month. Along with the screenshot was a note saying,

"All employees are missing attendance records from dates 22nd, 23rd, 24th and so on..."

To which I replied with a screenshot of a calendar with today's date highlighted as the 21st

The client did not reply :D

Collapse
 
stealthmusic profile image
Jan Wedel

I’ve already posted my debugging some time. It about how we were trying to find a reason why some customer passwords did not work.
Enjoy 😊

Collapse
 
lepinekong profile image
lepinekong • Edited

In a big corp, on a java financial application for about 2000 internal users, there was a huge bug that regularly lost datas, each month I had to play sherlock holmes to reconstitute all history of update from log with a giant sql to put datas back so as to feed the accounting app. The other department who developped it couldn't find why, for them it was due to the fact it was a big application so from time to time it was overwhelmed and so lost data. Improbable explanation, but politically you cannot argue in a big corp... I wait for a huge change in IT system to have more favorable political context to oblige to fix the problem after 2 years. Finally the cause was found: someone made a copy and paste of a value, when a user follows some path they didn't expect, that bad code was executed and caused data to be dropped. Overall, the cost of such bug just discounting a few days I had to lose each month on it was about 100K Euros for a bad copy and paste by a developer :)

Collapse
 
teej profile image
TJ Fogarty

There's way, way more to this story, but here are the bones of it:

A company was contracted to help with the build of a new online streaming service. They destroyed it and I had to spend two straight weeks rebuilding it. Nothing against the person that did it, they were probably under pressure themselves.

I was up early Saturday morning for the launch. Everything was going well, people were able to purchase access to the stream as they should. Then it was decided to reduce the price to encourage people to join. It didn't work. People were being shown one price and charged another (nearly double the new price).

After digging through third-party WordPress plugin code, I discovered that the price was hardcoded. Still not sure to this day why it was done, but it was an experience. Needless to say the project was shut down shortly after.

Collapse
 
mvoloskov profile image
Miloslav πŸ³οΈβ€πŸŒˆ πŸ¦‹ Voloskov

I used to work work with CoudhDB. I got a strange error, something like ENOENT 0x938ad373ef BROKEN PIPE so I contacted @wohali (the coolest CouchDB expert on github) and she told me that this nonsense means "Out of disk space" in Erlang. Ouch.

Collapse
 
_nicovillanueva profile image
Nico

Two long stories, shortened (for context, I work as a SysOps, so much of my involvement was from a support standpoint).

  1. My team and I were once tasked with rebuilding a payments gateway for a company in Argentina. From Java 6 monolith, to Scala fancypants microservices, with Kafka between them and a MySQL. The idea was to use Kafka as a middleman in case the DB went down. If it was unavailable, some consumer would later pick up the Kafka log offset and insert all missing transactions.
    In essence (and as far as I remember), the problem as that we did not save such offset correctly, and thus, we had lots of transactions stuck in Kafka, but not persisted in the DB. The cool part is that that happened during the Lollapalooza tickets launch, so they were thousands of transactions amounting to a few million pesos stuck in Kafka. Around 32 hours without sleep and a few scripts later, we managed to recover it all.

  2. Same payments gateway, long after the Kafka bug. Can't remember exactly which was the feature being implemented, but whenever we had to charge the client, say, $100.00, we were actually charging $1.00.
    Remember kids, store cents in a different structure. Learn how floats work, never trust them, and much less use strings or whatever. Two separate integers, always.

Collapse
 
jackharner profile image
Jack Harner πŸš€

Here's a dumb one I just encountered:

Spent WAAAYYY too long wondering why palette_desc wasn't getting set properly.

Collapse
 
sanzeeb3_18 profile image
Sanzeeb Aryal

I've tried a bug fix for a day only to know it was a feature. I am still confused whether it was a bug or a feature. They didn't understand me.

Collapse
 
berniwittmann profile image
Bernhard Wittmann

This is probably the best bug story I ever heard (Of course not mine)

The E-Mail that could not be sent farther than 500 miles

Collapse
 
garfbradaz profile image
Gareth Bradley • Edited

Once there was a very hungry Caterpillar.....πŸ›

Collapse
 
cubiclebuddha profile image
Cubicle Buddha

I wish this article was called, β€œtell me a ghost story... ghost in the machine story.”

dadJoke

Collapse
 
kayis profile image
K

Had one in a 10000 LoC PHP file.

One big class everything interconnected, xdebug blew up when I started it.

Took me around two weeks to find and one line of code to fix.

Good times.

Collapse
 
razgandeanu profile image
Klaus

Worked at the R&D department of a huge retailer.
Someone pushed a small change and the invoices started having 0 instead of the price.
Talking about production, not some dev environment.

Collapse
 
codingsam profile image
Coding Sam • Edited

I published a post about a bug story this week. Check that out

Some bugs are really tricky 😁