DEV Community

Ben Halpern
Ben Halpern

Posted on

Tell me a bug story

Bugs are inevitable; Debugging is painful, but the experiences make us better developers.

So let's hear it! Tell us about some of the bugs you've encountered, and how you dealt with it.

Oldest comments (53)

Collapse
 
gosukiwi_20 profile image
Federico Ramirez

I had a bug and tried to fix it using threads. Nnoww II hhaavvee ttwwoo bbuugss.

Collapse
 
briwa profile image
briwa

Haha yup, just like what they said when solving a problem with Regex, now you have two problems...

Collapse
 
pinotattari profile image
Riccardo Bernardini

Regex are great! ☺ Seriously, I think I am one of the few that really like them.

I agree that as they get a bit complex, some regex look like line noise (I always try to find a less obscure syntax), but for searching and extracting text are really powerful.

Collapse
 
zerquix18 profile image
I'm Luis! \^-^/

I was using one of MaterializeCSS's date pickers. This one has a prop called minDate which allows you to set what's the minimal date for this selection. This worked wonderfully.

I was passing the variable to Materialize's date picker, which was a date object. This date object had the date and time selected by the user, so Materialize was working properly.

The problem was sending the data to the server. The date variable was somehow changing. It was losing the time. Time was 00:00:00.
I checked my code, from the very beginning, down to my Redux store. Everywhere. My date variable was OK and it HAD the proper time.
I spent hours checking why it was changing. Is Javascript suddenly crazy? Why is this happening to me? Could it be a bug IN javascript?

Turns out Materialize was mutating the date object. The solution was just cloning it minDate = new Date(selectedDate)

That fixed the issue. Lesson learned: be careful with mutation.

Collapse
 
ben profile image
Ben Halpern

We had a bug in the DEV signin process forever which would randomly show an error message as invalid credentials or something like that as the error message being passed back to the oauth response. It was very uncommon, but never went away.

It turned out it wasn't invalid credentials, but the error message was actually just more of a catch all for "something went wrong". It turned out to be a timeout error. It sometimes took more than ~9 seconds to create an account and the request timed out.

That was a doozy. @andy mostly figured it out.

Collapse
 
vinceramces profile image
Vince Ramces Oliveros

I still experience the same thing when Sign in to GitHub account. I do want to submit an issue, but... It's a feature

Collapse
 
andy profile image
Andy Zhao (he/him)

We have some recently reported bugs about sign in issues, but each case has been a little bit different so far.

If you're having issues with your account, feel free to submit an issue for any bugs or feature requests to the repo: github.com/thepracticaldev/dev.to

We also provide support via our email: yo@dev.to

Collapse
 
andy profile image
Andy Zhao (he/him)

Oh yeah!! That bug was a trip. 😥

Collapse
 
mandaputtra profile image
Manda Putra

Oh it is, when my connection slow I always get that Invalid Cred message 😂

Collapse
 
philnash profile image
Phil Nash

I used to work on a site that you could log into with your Instagram account, pick a bunch of pictures and buy them as fridge magnets.

This worked mostly well, but for the occasional pack where some or all of the images would fail to download. We spent ages working with the code that downloaded the image, trying to find where the bug was. (I feel more confident with downloading images in Ruby now!)

Ultimately, we decided the code that was downloading the images wasn't the issue, so perhaps it was the Instagram API? Or a flaky connection from our server?

More investigation lead to the discovery that on occasion a user would just delete their picture from Instagram, leading to our failed download.

So, we moved the downloading from on demand when the print job was run to a background worker once the user made their purchase. Jobs would still occasionally fail.

We moved the job to before the user even completed their purchase. This helped, but jobs would still occasionally fail.

I'm not even sure you could call it a bug in the end. Some users were uploading pictures to Instagram just to get them printed and then deleting them immediately. It didn't matter how many workers we ran against the job queue, there was always a user that was faster at deleting their images. The eventual fix was the loosening of the Instagram restriction, instead allowing users to use Facebook photos or upload photos from their computer/phone. When users no longer had to use only Instagram to get their images on to our site things became better. This was more work for us (Instagram photos were just square at the time, which fit the magnets, opening up to non-square photos meant we needed an image cropper and just a lot more UI) but was better for the user.

Am I calling users bugs here? Of course not! But understanding the ways that user actions can affect the way your site works is just as important as an esoteric language exception. And if something is failing, there are more ways to fix it than just inspecting the code.

Collapse
 
jacoby profile image
Dave Jacoby

We are a lab in a school that does science for other labs. We had a page that had been working fine for most people for a while, but we got a report that a person trying to use this page and the links wouldn't work, which means that user couldn't do her work.

I take a look, click links, and everything looks fine. And then I look at the logs, and by process of elimination, I figure out that this user is a Mac user using Safari.

I take a look at the source, and ...

We all know that URIs are (protocol)(server)(path), and this page would be (https://)(dev.to)(/ben/tell-me-a-bug-story-59e2), and you can do "absolute" links with just the (path), but did you know that you can include URLs that are protocol and path without server?

No, you didn't. Because that in an abomination before Tim.

But someone did. Someone, or a significant number of someones, did this, enough to make it so IE accepted it. And FireFox. And Chrome. But Safari didn't.

This was hard for me to test, because I have almost no Apple in my life.

So, to clarify: The bug isn't that we use this abomination (although that's what I fixed) but that it is industry standard for browsers to accept this, and Safari didn't. Be liberal in what you accept, I guess.

So, we:

  1. changed our code so we didn't continue in sin
  2. I created an Apple Dev account so I had access to the issue tracker, and when I tried to submit this bug, the bug tracker crashed. :shrug:
Collapse
 
jackharner profile image
Jack Harner 🚀

I created an Apple Dev account so I had access to the issue tracker, and when I tried to submit this bug, the bug tracker crashed. :shrug:

That's the best part of this one 😂

I definitely feel your pain on debugging Apple issues. I have access to an iPod Touch at work, but that's about it. Debugging on that without a Mac to connect it to is basically just trial and error.

Collapse
 
ben profile image
Ben Halpern

I’ve never been heavily involved in the Apple ecosystem but I’ve been around enough to feel this pain 😭

Collapse
 
cubiclebuddha profile image
Cubicle Buddha

I wish this article was called, “tell me a ghost story... ghost in the machine story.”

dadJoke

Collapse
 
javierg profile image
Javier Guerra

My favorite story is one time we where preparing for a big presentation on a new feature, which involved rendering lots of things with handlebars on the browser, everything was working perfect. We showed to the manager, he liked it, and ask us for a rehearsal on the presentation later this day.

During the rehearsal, the page stop working. The browser didn't render a thing. After some debugging, it turn out that the machine where the presentation was happening did an auto Chrome update that bloated the GPU usage, but only with OSX Yosemite. Took us many hours to find this out. But the fix was easy. Use Another machine for presentation, and wait a couple of weeks for another chrome update.

Collapse
 
sanzeeb3_18 profile image
Sanzeeb Aryal

I've tried a bug fix for a day only to know it was a feature. I am still confused whether it was a bug or a feature. They didn't understand me.

Collapse
 
briwa profile image
briwa • Edited

The recent one was about Highcharts. You were supposed to have an animation when hovering the mouse on the legend.

jsfiddle.net/y92haq35 (somehow the fiddle can't be embedded)

It was fine on an isolated environment, but somehow the animation didn't appear on our page. I thought it was a configuration issue, so I copied and pasted the exact same config on our page. It didn't appear too. I was having trouble inspecting the styles because it isn't triggered semantically by CSS, rather by JS.

I inspected the source code, but every hover class was firing up properly. I tried it on a different page in the app, the animation is there. So something was causing it in the original page.

After painstakingly removing the components/modules in that page one by one, seeing which one causing the problem, I found out that there is a line of CSS that goes like this:

/*
* (a note about a bug its trying to fix)
*/
.highcharts-series-hover {
  opacity: 1 !important;
}

Basically this line says all Highchart series would have an opacity of 1, so even if the animation kicks in, this line overrides it (with the !important) so that it looks like there is no animation. Should've fixed the actual bug from the issue tracker...

And that concludes 3 hours of debugging. I think I didn't do a good job debugging it, any suggestions? 😂

On another note, how do you all prevent these kind of CSS bugs? Visual regression test? Eye test?

Collapse
 
kamranayub profile image
Kamran Ayub

Recently we had a performance issue on a web app I was working on. We showed a series of questions where one question's answer may lead to a few new ones showing up.

The issue was that after answering the first question, answering the second question took 2 MINUTES to render the next set of questions. The UI was blocked the whole time.

I started to dig in. Our app let users manage 50 different products at once so this issue only started to manifest when you managed a ton of products. I also counted that the second question triggered 13 additional questions to show up.

When I started to log the number of Redux actions being dispatched, I was amazed to see we were dispatching (and re-rendering) 6000+ actions which is what was causing the slowdown. Each action dispatch was about 20ms (x 6000 =~ 2mins).

The questions had a graph structure where questions could relate to one another. Turns out we were following this graph structure and dispatching actions even if no actual data was changing, so I updated the logic to compare previous values; that cut it down to about 650 actions (50 products x 13 questions) which is what it took to make the new questions visible.

This reduced the time from 2 minutes to 25 seconds. Because the rest of the actions were technically needed to change state, I introduced the redux-batched-actions package to batch all those actions and dispatch one single action. Doing so reduced the time down to about 2 seconds. Much better!

Eventually, what I discovered was adding significant time was the JS ... spread operator. What! Turns out because we needed to support IE11, the spread operator was being polyfilled and this implementation was slow as heck. We switched some critical code to using assign instead and it reduced one functions total execution time from 1s to 12ms.

Overall, I got it down from 2 minutes to <100ms by doing these optimizations and simplifying the complexity of some functions to faster O(1) and O(N) implementations.

What a doozy! That took a good week to work through but the improvements were to core code in the app so the entire app benefited from it!

Collapse
 
philnash profile image
Phil Nash

Wow! That is quite the performance improvement. If only redux warned if that many actions were dispatched, it feels to me like 6000 actions for one request is unlikely to ever be on purpose!

Nice job tracking it all down though. 2 minutes to under 100ms is amazing!

Collapse
 
downey profile image
Tim Downey

I work on a Ruby API that serves acts as a core piece of the control plane for an open-source PaaS platform called Cloud Foundry. Our users install and operate the platform on the infrastructure of their choice (on-prem vSphere, AWS, GCP, Azure, etc.) and everyone tends to use it a little bit differently. This leads to lots of possible configurations and makes certain types of bugs hard to triage and even harder to reproduce.

One bug (or unforeseen usage pattern) we had seems really obvious in hindsight, but ended up taking weeks of investigation. We had some users report that their APIs were consuming huge amounts of memory and every six-minutes would reach ~8GB of ram usage and restart. Now Ruby isn't the most lightweight programming language, but it shouldn't be that bad! We initially expected a bad memory leak, but pausing the interpreter and manually forcing garbage collection was able to free up most of the memory. So we ended up crawling through heap dumps (wrote a blog post on this process with my team) and eventually found out there were tons and tons of User model objects in memory.

Turns out that this installation had all of their users (10,000+) belonging to the same organizational unit (called a Space) and we had a frequently-accessed line of code that was loading this full array of users into memory every time an API endpoint was hit. It was simply trying to do an existence check to see if a particular user was a member of the Space in question, but because of how we used our ORM it was instantiating and loading all users within that space into memory. Since our test environments (and many other production environments) tend to only put dozens or hundreds of users in a Space we hadn't encountered this.

The fix ended up being super simple:
Do the existence check in SQL instead of in Ruby

😂

Collapse
 
picocreator profile image
Eugene Cheah • Edited

One word: Cache

We have cache on browser, cloudflare cdn, instance memory, shared key-value cache, the deployment package (it has its own cache, as we can have a 100 nodes pulling it for updates at 1 time), and finally the DB itself.

Having any piece of data in between serving the wrong "outdated" data, is always an exercise of buggy frustration, as you can be stuck wondering why the data is outdated. Despite deploying the latest build.

Many variants of this story, and it's always the "cache".

The solution is to clear it, the question is which one - and how.

Collapse
 
codingsam profile image
Coding Sam • Edited

I published a post about a bug story this week. Check that out

Some bugs are really tricky 😁

Collapse
 
darksmile92 profile image
Robin Kretzschmar

The most recent example was the 10pm idea to "Just upgrade react and rewrite all components to arrow functions and use hooks".

Worst.Idea.Ever.

I upgraded react, rewrote all components to be arrow functions and use hooks (useState, useRef, useEffect), aswell as property destructuring (const Comp = ({ classes, children, fancyProp }) => instead of just const Comp = props =>).

This took me about 1 hour and I just wanted to run it to check everything is behaving the same before pushing the changes.
Oh boy, I was so sure I'm gonna make it to bed on time!

Starting with wrong dependencies after the react upgrade, I needed to upgrade other packages too. And this was where the struggle began. Suddenly babel showed me just Not implemented and nothing else.
After searching a while I became aware that I forgot to upgrade the react-router-dom package.
Next try: ... Object({...}) not implemented. 😒
A good 30 minutes later and with rage pulsating veins, I found out that the @material-ui/core package was upgraded and now there are some changes I need to implement.
Almost 4 hours later, I finally got everything working again and I swore to myself that I would never do something like that so late in the evening 😂

Some comments may only be visible to logged-in visitors. Sign in to view all comments.