I was tasked with creating some Ansible configs for these build agents. The machines being spun up from them were identical, but spread across 3 different networks: A, B, and C. The big difference was one zip. A and B got it from shared drives, but C pulled it from our Artifactory. I was told that the one in Artifactory was the same from both A and B.
A and B were fine but machines on C were failing. I figured it was the zip, and it was...but it took the whole day and 2 30-minute Zoom meetings with different folks.
The problem? Well all 3 zips had the same name: Dir_X.Y.Z_14.0 but
The zips on A and B unzipped to C:\Path\To\Dir_X.Y.Z_14.0
The zip on C unzipped to C:\Path\To\Dir_X.Y.Z-14.0
A single-character typo brought me to my knees lol. Someone renamed the directory to have a hyphen, but the zip they created still had an underscore, lol. Ahh good times.
If I ever get to 3 hours staring at the same bug, I generally get up and go for a walk or get some other eyes on it, or try to tackle a different task and come back to the bug later.
Maybe not related, but definitely have had long stretches where a certain bug is 'fixed' only to pop up again a week down the line...
The most I've worked uninterrupted on the same bug is probably around a week. It was one of the worst bug I'd faced too : Some of our clients data would get randomly deleted for no reason and noone had any idea what was happening. I spent days trying to debug every single API trying to determine what could do that...
I eventually ended up parsing the mysql binlog searching for every delete statement on that table, searching where it came from in our codebase, and rerunning them one by one...
Turns out someone had forgotten some parentheses in an 'OR' condition months before.
I spent several weeks trying to figure out why images from Windows Snipping Tool could not paste into Quill WYSIWYG and then a couple more weeks trying to fix it and work with other kinds of text and image pastes. I even wrote an issue for it that's still open! I've changed jobs 3 times since I wrote this and now I'm back to using it again in my current work project.
A paste event is detected, but the images never show when you try to copy and paste images from things Windows Snipping Tool. Copying and pasting images from Google, for example, has no issues.
It seems like there is a timing issue for reading files with a base64. I have not been able to reproduce a "fix" I discovered in CodePen, but in the actual project I'm using Quill for, extending the Clipboard module and lengthening the timeout duration at the end of the default onPaste function makes pasting from Snipping Tool work. The bigger the image that needs to be pasted, the larger the duration needs to be.
Again, I am not able to reproduce a bug caused by my "fix", but in my project, lengthening the timeout duration causes two "regular" images to be pasted. I'm throwing this part out there in case it comes up for anyone else. It may be something in my project.
3 months, not non-stop obviously, but I continuously went back and tried multiples things multiple times. Even did a 100% full on re-install of the operating system.
The issue? Bad vim-airline fonts on my Raspberry Pi.
The solution? Run a command to update the firmware of the Raspberry Pi.
Can't remember precicely how long but probably 1/2 month to one month, it was a dotnet "thread starvation" issue, where it was just running out of threads to run operations. Had a lot of false flags and a lot of debugging to find the actual issue. I hope to never see that error again vietnam flash back
Mainly web development for money. Everything else for fun :) Rust, WebAssembly, Flutter, ML, C64 Assembly, Raspberry, ... a lot of plans, much less free time to work on them.
It was very-very long time ago, in the late 90's.
I wrote a little game in Watcom C (somewhere between Wolfenstein and Doom, only walls and simple floor, but with not only perpendicular walls). Trigonometric functions were very expensive, so I used a generated sin table. I copy-pasted it into the source, but it looked ugly, so I lined it up with leading zeros. It was a mistake, because 0****** number are octal in C, so very strange things started to happen on the screen. It took a few hours to debug this and after that I was literally banging my head into the desk.
Unfortunately, I spend once, 3 months on a single bug. It was very long and I got desperate about it.
It was with a teleoperation application with a universal robot. The robot demo worked correctly in our office but not when the commercial did the demo at UR. For some reason, one demo was working, not the other we didn't seem to have any communication between the robots. Well, the second demo was working for 2 to 3 minutes and then both robots stopped, and stay blocked. It took a very long time to solve this, because we didn't have the setup to reproduce the bug in our office and I was not in a good health condition too.
Finally, the problem was from our robot dll library. They were an update, I was not aware of, between both demos. A colleague decided that a division by 2, which was not documented, was not "supposed" to be necessary. So he had removed it from the real-time library without further notice and pushed it in production. The result was that communication between both robots was not running at the correct speed. It was 2 times slower, therefore the teleoperation was not possible. Several months later, the bug is still in production, because "it was not the problem". Well it solves mine actually and I had figures to prove it.
This happened some time ago, I am not in this company anymore.
I tend to be the sort to leave something and come back to it in a week, especially for intermittent issues that are hard to reproduce.
Though I did get a fun text the other day "why does this test not work in IE?" from my last job's devs... 1. because I stopped trying to do a hack to fix it and 2. because <linked them to the IEDriver GitHub issue where Selenium said making IE work was so low priority that they didn't care typing was broken>
Technically I probably spent 4 years with those tests working in IE barely half the time because of timing around typing. I solved it by getting a new job :P
3 days. I neglected to filter NASA API results on the backend, so when Curiosity took 14,000 photos of Mars on one day it crashed my site lol. I was new to Python/Flask so it took me a while to figure out-- it was as simple as using a limit method!
A Quill editor was rendering incorrectly, where all the line breaks between paragraphs (p tags) are stripped and multiple paragraphs are stitched together, but only in some specific Vue components (a fact that I should've paid much more attention to). Tried and mix and match every single configuration (well, probably not but definitely over a dozen) I can find in their documentation and their Github repo. Finally, I even dived into the source code, but with little achievement.
Finally, the cause was stupidly simple. In those components, paragraphs were styled with either display: flex or display: inline. I literally jumped and cheered at the moment, half celebrating my success and half laughing at myself. Nonetheless, a GREAT lesson learned. :)
Ha! I have a painful one. Took me nearly a good 3 days to find it. (between tackling other stuff when I hit a wall)
Note: I'm in GMT timezone.
The company I work for has an analytics view which takes a deep dive into the analytics of media the company serve. In November 2019, we got a message from a client saying numbers from our excel download functionality don't match that of their internal systems.
The numbers started off fine, but then massively increase after an arbitrary date. (clue 1)
The client was on the west coast of America, we provide all our analytics in UTC time (clue 2)
The client had multiple occurrences where the analytics was wrong after the arbitrary data. (clue 3).
I didn't have a problem when getting the data. (clue 4).
The problem?
Daylight saving
Without going into specifics.. the problem was going back an hour and then calling .startOfDay() on that date meant we would end up with two days worth of data after daylight savings.
It was years ago so I don't remember, but probably about a week.
My first job was a startup providing an e-commerce recommendation service, doing the frontend stuff, and we'd just added carousel functionality to our service, so you can see lots of recommendations. For some strange reason, adding the carousel also caused the page we put it on to sprout 6 feet of whitespace on the right side. There was no (visible) added content, the page just got a scrollbar that ran for a ways off to the right.
I wound up saving a local copy of the page, and removing scripts until I found the culprit. It turned out that the carousel we were using was one of those types that puts the content in a big ribbon div and moves that left and right within the window div, with overflow: hidden on the window so we didn't render the full ribbon. However, even if we couldn't see the ribbon, the client's quick-view library could, and slapped its own "Hover here to see the quick-view modal" buttons on everything in the ribbon. Again, this was years ago so I'm probably misremembering exactly how this worked- Because those buttons were outside of the ribbon, and absolutely positioned, they still took up render space, but because they were set to hide until the user hovered over the product image they were associated with, they were not visible on the page.
IIRC the solution was to add some hack on the carousel so that it toggled the quick-view tags on hidden/revealed images and then called the quick-view function to do its business. Awful, but it worked.
Me and a colleague found out that we want to make a tool specific for our work. We're both into programming and seem to know our stuff, but is not our primary work tasks and we're not hired as such. Pretty easy stack tbh: sql and php. He did backend, I did frontend.
I set up sql-server locally with all the correct tables and got my colleague's code and started my tweaking.
At first I was having some issues with running the php-site directly through php -S localhost:8000 . and connecting to the database. Having some experience with programming in Python and knowing that a clean environment is the best environment, I thought why not just make a clean virtual machine with ubuntu server and XAMPP. Set it up with a NAT network adapter and forwarded ports from localhost to the VM. The I installed the newest Ubuntu Server and started coding on that instead.
But I experienced the same issue.
Start DBeaver to check out the db, yep seems fine, the db and tables are all there and looks great. I have another go. Same issue.
As a dirty fix, I started coding directly on the staging/prod-server just to make sure that my changes are working as intended. They do, and gradually it crawls to a completion.
It's only after two months and about 200 commits later I realise that I never stopped the local sql-server running on my machine and changing the host and credentials to the sql-server.
It was the same database the whole time ๐คฆโโ๏ธ
I spent a week trying to figure out if I was doing something wrong or if I had found a genuine bug in WCF. My gosh it almost broke my spirit. I don't think it has been resolved yet. github.com/microsoft/dotnet/issues...
30+ years of tech, retired from an identity intelligence company, now part-time with an insurance broker.
Dev community mod - mostly light gardening & weeding out spam :)
Ewww, nasty. I too have spent waaay too long reading the source for WCF when things do not behave as documented/expected! Probably the longest was when investigating session leakage while using the WS-SecureConversation protocol. It seems absolutely nobody else in the world made that decision, and we probably shouldn't have either, but customers were now using it (30k+ of them) so we had no choice but to find & fix the leaks.. all told a rotating team of 3-4 people spent ~1 year (over a period of 6 months) finding all the ways customers could break stuff and patching up the server side...
Just before I retired, we had a plan to emulate the session aspects of the protocol, and I had a POC working which avoided actual server-side sessions, it employed JWTs to carry the security session data back and forth instead. This would have fixed a lot of problems with state management and scalability, I have no idea if it got implemented!
Recently 5 days, off and on between meetings. No stack trace, just a build that kept slowly moving along taking almost 1 hour until I tracked down the culprit: Emotion 10 and how it handles type definitions can slow TypeScript compilation to a crawl. I figured it out by looking for similarities between packages that were slow in a monorepo, then commented out code until I found what caused the slowness and got the build down from 45 minutes to less than 1 minute.
TLDR; I wrote a GitHub Action using Docker and Bash without knowing a lot about both. Someone let me know it wasn't working for them so I spent probably an hour a month looking at it for about 6 months before forgetting about it.
Eventually, someone else opened a PR to fix it. Open Source FTW!
I'm passionate about web development and design. A team player who treasures effective communication. Eager to learn as much as is humanly possible on my road to web-development knighthood(haha).
Location
Nigeria
Education
Bachelor of Engineering, specializing in Electrical and Electronics discipline
Three weeks. This was when I first started out in web development. I fixed a bug that prevented the project from building in heroku but I kept pushing to the wrong git branch(I used git push instead of git push origin master). So when I pushed again to heroku it would fail over and over again. I have never made that mistake again.
In sum.: three weeks with 2-3 developers. It was a corrupted ponter in a medical device. The issue was really hard to reproduce and even harder to understand the root-cause. At the end we found a couple of threads which tried to release a pointer and only one implementation of those three threads was broken. And kt was a legacy codebase without support from the authors.
Used to do DevOps before they even called it that way: Linux. Python. Perl. Java. Docker. For fun and profit. CTO level generalist working for a mid-sized tech-centric company.
Dresden, Germany
Depends on how you count, probably. Weeks to months I'd say, in terms of a quite peculiar persisetency bug bringing an application server to a screeching halt then and now for no obvious reason. Fixing this was rather trivial as soon as we actually understood what went wrong. ๐
Front-end developer since 2016. Focused on React with GraphQL while studying software architecture, design patterns, emotional intelligence, and leadership.
1 week and a half, trying to overwrite a css rule from a .net core app causing styling issues on a child react-based app. I needed the help from another dev for 4 days until we get the fix. Man... that was a challenge at another lever for me.
Over the years time scales have shrunk for both release cycles as well as debugging, I started programing on hard real time systems, using assembly code. Then we would normally achieve one (occasionally two) release a year. The system would was expected to be in service for a minimum of 10 years.
I can think of one intermittent fault on an interface between two systems which took me nearly a decade to find. This wasn't continuous effort but I had at least three attempts at resolving it. By the time I started looking at the bug the system was already a legacy system with a replacement contracted through our competitor wa on its way. I was a Junior engineer known for having an aptitude for low level coding, so I was put to work. Not having access to one of the two systems I could only review the code and write a report.
Two or three years later out customer dug up my report and agreed we could have access to both system, the catch was I only had a single day on site at the opposite end of the country. The nature of intermittent faults when debugging is the fault will not occur, true to form the system ran perfectly for the whole day and not useful information was gained.
As luck would have it our competitor failed to deliver on promises and our legacy system got a life extension and was rehosted on new hardware. I lead the software effort and in went in to service at which point I left the project. Once in service the original intermittent fault came e back with avengeance, our customer was not happy. I go seconded back to help fix the issue, we enhanced our simulator to emulate the other system and started debugging. Eventually I found the issue which we traced to using the wrong entry point in an error recovery routine in the Real Time OS. The programmer some 22 years earlier had types a 5 instead of a 3. The junior engineer who modified the simulator for me was younger than the bug! Having fixed the bug I was reminded of the report I had written nearly 10 years before, which correctly pointed to the exact error routine at fault.
I graduated in 1990 in Electrical Engineering and since then I have been in university, doing research in the field of DSP. To me programming is more a tool than a job.
A couple of days; an issue with pointers in a C program.
It began as usual: the program dies with a segmentation fault, open it with the debugger to check where the fault happened and... the stack is a nonsensical mess. Ouch. This is not a good sign, stinks of dangling pointers or similar.
In cases like this the actual error can be anywhere and it could be necessary a veeery long time to find the actual bug. It turned out that there was a problem not with just a pointer, but with a pointer to pointer to pointer to ... deep three or four levels.
I am soooo happy that I now code in Ada and not in C anymore.
I don't have a means to know, but I recall one which would have been around 1 month, but most of that time was ignoring the bug.
I had just come on to the project, the bugs had been mentioned but were not something I could directly start investigating.
I had to build out my test infrastructure, with mocks of our integrated component. This meant reading 3rd party API docs and building the correct communication lines.
After all of this was worked out, replicating the bug was easy and being specific with the cause was just a matter of describing what the code was doing.
I recently had a bug in my message stack, which took a few days to isolate what was happening, then took 2-3 weeks to fix, as I had to build a new message stack.
Web Engineer. Working mostly with PHP, Symfony and Golang.
Entusiast about Engineering Best Practices, Continuous Delivery and DevOps.
Sports and FC Porto fan!
Once I was working on a project that used ElasticSearch. I was changing some things in a list page and noticed that the results were pretty random.
After maybe 2 or 3 days trying to understand what's going on, I discovered my local ES config had the default cluster name and open in the network, so it automatically created a cluster with a colleague machine and I was seeing his data.
I don't think it was the bug I spent more time, but it's one I dont forget.
Now that I think about it, it's pretty funny, but it was definitely not at the time. ;)
I have written about it before, one of my longest bug hunts took over 2 weeks. En eventual fix was switching 2 lines of code.
Sure, it was not in our code base, but a somewhat minor bug in a 3rd party library. Which combined with an other 3rd party library, and another one, and the way our code base was set up... it basically made our software unusable.
There was an other bug, which I have no idea how much time I spend in debugging. This again was caused by a combination of 3rd party libraries in a given setup. In total I might have spend way more than 2 weeks to figure this one out. The main issue was that it was leaking memory quite slowly. It took more than a week before you would even notice it in monitoring. It eventually lead to attacking the bug in two libraries [1][2]. I don't know how much time I spend on this as I tried to tackle it multiple times over a long period. It wasn't a really high priority issue.
And then there was a bug which plagued our software for a really long time and resulted in quite some P1 issues where we had to restart the server. This was years ago. It was caused by the bad way our software was set up with mixed technology, resulting in deadlocks which killed the whole system. At that point we were both using our own ORM and Hibernate, mixed with EJBs and other things in Spring. We were kind of in the middle of the transition from JEE and own ORM to a Spring based setup with Hibernate. Some entities were used in both ways. Fixing this was no trivial task. But eventually I figured out a workaround that could sustain us while we (slacked) in continuing the move to the newer technology. Again no idea how much time I spend on this. I also did not really fix the bug.
So my monster-queries library leaked memory and it took me I think 4 years before I fixed the bug, and by fixed the bug I had to create a new language using AST which I call querylet: github.com/teacherseat/querylet
~2 months of trying to find a race condition in a bunch of goroutines. I had to create a docker image with the debugging bits included (dlv) to connect to it remotely.
15 days. It was just that I used class variables in places where I needed instance variables. This caused race conditions and thus those variables were updated from different threads (Sidekiq jobs). Yes, I was new to Ruby back then, but that was really a pain.
I was working as an iOS developer at the time for a hardware company. The app I was working on could connect to the hardware device and you could configure it using my app over USB.
I was debugging why the connection would drop after sending too much data to the device. The protocol we were using, usbmux, was reverse engineered and turned into a C library (libusbmux). Troubleshooting that was moderately easy, but troubleshooting things on the iOS device was super difficult.
At the time Apple had not implemented network debugging, so I didn't have any way to actually debug my app while also having it connected to the device. It took SO MANY times of just logging out to a file until I finally caught a clue.
The entire protocol is TCP based, with a server on both end that handles sending the packets over USB. I noticed that the send buffers on the iOS device quickly filling up, despite the data reaching the hardware device. Knowing this I was able to finally identify that under certain situations libusbmux would not send TCP ACKs for received data, and the iOS device would think it never reached.
It took so long because:
1- I had no access to any actual debugging methods other than printing to a log file
2- Modifying the code on the hardware I was connecting to was very tedious and took forever (compiling the Linux kernel is slow!)
It also did not help that the people that implemented this feature had long since left the company, and the poor guy who wound up owning this feature just wanted nothing to do with it.
At one point I was just questioning my own sanity as I found myself, with the title of "Web Developer", working on an iOS app, debugging the TCP stack in the Linux kernel.
2 weeks straight. Streaming parallel programming in Rust. It was incredible once I got it working, though, it was for an image thumbnailer that would update the list of thumbnails in realtime. It could process thousands of images in only a couple seconds.
The bug was critical but the cause was extremely subtle. By the end, it wasn't just me trying to fix it but the whole team including the team lead. It was just a quirk in Angular.
The problem? We were setting everything in our new self-hosted kubernetes cluster. Set up our CI/CD pipeline on it, but it wasn't able to build our stuff, specifically it failed to fetch stuff out to the internet.
The bug ended up been some missconfig MTU value somewhere between kubernetes, CI/CD, and the DinD used to build.
Two lines of config change later and everything is working fine
I spent some good two or three days debugging an issue on a React Native app running on, at the time, a newly released iPhone X. A new feature the team was developing started to lag only on iPhone X devices.
I found out that there was a memory issue after using the search feature, but I was not finding out why it was happening. After removing everything, I found out a TextInput property was causing it. Removing it fixed the issue and it didn't impact on the app usability at all.
I don't remember if I found out why that property was causing the issue, tho.
I was 3 months in in learning how to code. I offered. my. service to my cousin to build an app for his business. I used Creative Tim's Bootstrap Material Kit UI kit. I forgot. to load the Javascript script tag on the index.htm which led to the hamburger menu being buggy. I realized after 3 days that it was the script tag that was the culprit of the bug. ๐คฆ๐ปโโ๏ธ
My cousin ended up not pushing through with the deal due to financial issues aka I priced too high at $2k for a static site. ๐
I used to work on some data processing software for a particular measuring device (is that vague enough for you? Sorry, I'm just being overly protective). I would get reports that the software was slowing down if it was used for days at a time. I would occasionally look into the issue, but it stuck around for months until one of our in-house users was able to show me the problem in vivo.
The problem itself was a memory leak, but for some reason I couldn't get that "ah ha!" moment until I saw it in context.
I guess I didn't spend all that long over all debugging the issue, but it played on my mind for all that time.
Hi! I am a highly motivated Full stack Developer, Web Enthusiasts. Familiar with MEAN, MERN stack, Django and a core C#/.NET developer. Looking for a new opportunities!!
We were modernizing our reporting solution from Crystal Reports to SSRS and there was a formula written to calculate adjusted hours(CST / IST time difference, weekends, holidays etc..). This formula was written in COBOL with no documentation. I was trying to convert it in C#. Target was to get same values from both formula. Bug was in calculating weekend hours values and took me whole of 3 days to figure that out
It was a horrible bug and the code was mixed between frontend & backend, with several scripts essentially running the same thing / interfering with each other and causing the issue.
It was about One month. We used a thridparty bpm engine. After a month we identified a memory leak. This Was at least possible with windbg and we identified the memory consume. So we identified that the dispoe doesnt disposed the interna resources...
I'm Calin Baenen โ AKA KattyTheEnby โ a programmer born October 30th, 2006.
I love programming, it has been my passion since I was a kid, and will forever be my passion.
Well, when I was working on the original version of Janky in Python, I was working for god remembers how long trying to figure out how to get properties to work.
And when making RuntDeale prototype in Py, it had a bit of an ordeal with how TKinter wanted to render things.
Frontend developer by day, iOS developer by night. Currently working on learning iOS development and my own blog, Mike Decodes, where I'm decoding the tech industry. Come hang out with me on Twitter!
I don't know the actual number, but a few months. I fixed the bug after taking a 3 week break over the holidays. Came in on January 4th and fixed it in like an hour.
I have some nice embedded programming stories for ya:
Two old colleagues of mine spent about one week on a particular issue:
They were working on a SIP stack (for audio connections/sessions), when suddenly it stopped working completely. After one week it turned out that the PBX (kind of phone/SIP router) had blacklisted their device for too many failed calls)...
I have spent about 1,5 months on another issue with a driver for flash memory. TLDR: some bit in a settings register was not set/reset by our driver, so based on whether the device had used an older driver before it would work perfectly OR shift everything 1 byte.
The unfortunate part of embedded programming (at least back then) was that:
it took around 2 minutes to compile and flash ANY CHANGE that you had.
many errors show up when the linker gets involved, which is at 99% of the compilation process (so after 1 minute and 45 seconds, something like that)
especially in the beginning, none of us knew how to debug/profile embedded software.
Later on we added profilers and proper debugging setups (and a hardfault handler that printed stacktraces. GAMECHANGER!)
Till Sanders โ Designer and Web Developer from the cloudy mountains of Lรผdenยญscheid. Spent the last decade learning about and shaping the difficult interaction between human and metal minds.
Ages ago I had to debug an issue on production, a different kind of production as the app was installed in terminals across 3 giant supermarkets. I had to figure out why sometimes the game gives too many prices than anticipated. It took me 3 days, I would spend the entire night debugging at home then install the new version in the terminals the next morning at 7 am before opening hours and observe. finally found out that I was missing a crucial step which is to flush the cache for the data to be persisted. The app was written in Adobe btw
In total maybe 3-4 days, but that was spread out over the space of a couple of weeks trying to figure out why something was happening the way it was for specific users.
Passionate generalist conquering the web one project at a time. Whether authoring libraries for node, JS, PHP, or Rust, I am always on the lookout for better solutions to common problems.
Location
USA
Work
Lead Developer & Co-founder at corpscrypt, CTO at REtech
I frequently spend a day only to sleep on it, realize I was focusing on the wrong thing, and fix it reasonably quickly ๐ It's amazing what stepping back can do to help you attack a problem!
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
Probably a whole 9-hr work day and some change.
I was tasked with creating some Ansible configs for these build agents. The machines being spun up from them were identical, but spread across 3 different networks: A, B, and C. The big difference was one zip. A and B got it from shared drives, but C pulled it from our Artifactory. I was told that the one in Artifactory was the same from both A and B.
A and B were fine but machines on C were failing. I figured it was the zip, and it was...but it took the whole day and 2 30-minute Zoom meetings with different folks.
The problem? Well all 3 zips had the same name: Dir_X.Y.Z_14.0 but
A single-character typo brought me to my knees lol. Someone renamed the directory to have a hyphen, but the zip they created still had an underscore, lol. Ahh good times.
If I ever get to 3 hours staring at the same bug, I generally get up and go for a walk or get some other eyes on it, or try to tackle a different task and come back to the bug later.
Maybe not related, but definitely have had long stretches where a certain bug is 'fixed' only to pop up again a week down the line...
^this
The most I've worked uninterrupted on the same bug is probably around a week. It was one of the worst bug I'd faced too : Some of our clients data would get randomly deleted for no reason and noone had any idea what was happening. I spent days trying to debug every single API trying to determine what could do that...
I eventually ended up parsing the mysql binlog searching for every delete statement on that table, searching where it came from in our codebase, and rerunning them one by one...
Turns out someone had forgotten some parentheses in an 'OR' condition months before.
I spent several weeks trying to figure out why images from Windows Snipping Tool could not paste into Quill WYSIWYG and then a couple more weeks trying to fix it and work with other kinds of text and image pastes. I even wrote an issue for it that's still open! I've changed jobs 3 times since I wrote this and now I'm back to using it again in my current work project.
Cannot paste images from Snipping Tool #2539
A paste event is detected, but the images never show when you try to copy and paste images from things Windows Snipping Tool. Copying and pasting images from Google, for example, has no issues.
It seems like there is a timing issue for reading files with a base64. I have not been able to reproduce a "fix" I discovered in CodePen, but in the actual project I'm using Quill for, extending the Clipboard module and lengthening the timeout duration at the end of the default
onPaste
function makes pasting from Snipping Tool work. The bigger the image that needs to be pasted, the larger the duration needs to be.Again, I am not able to reproduce a bug caused by my "fix", but in my project, lengthening the timeout duration causes two "regular" images to be pasted. I'm throwing this part out there in case it comes up for anyone else. It may be something in my project.
Steps for Reproduction
Expected behavior: All image pasting should behave consistently.
Actual behavior: Cannot paste images from snipping tools.
Platforms: Windows 10 (I have not tested this on others yet) Chrome 72
Version: My project uses 1.3.4, but the issue persists in 1.3.6. The CodePen is using 1.3.4.
3 months, not non-stop obviously, but I continuously went back and tried multiples things multiple times. Even did a 100% full on re-install of the operating system.
The issue? Bad vim-airline fonts on my Raspberry Pi.
The solution? Run a command to update the firmware of the Raspberry Pi.
Can't remember precicely how long but probably 1/2 month to one month, it was a dotnet "thread starvation" issue, where it was just running out of threads to run operations. Had a lot of false flags and a lot of debugging to find the actual issue. I hope to never see that error again vietnam flash back
It was very-very long time ago, in the late 90's.
I wrote a little game in Watcom C (somewhere between Wolfenstein and Doom, only walls and simple floor, but with not only perpendicular walls). Trigonometric functions were very expensive, so I used a generated sin table. I copy-pasted it into the source, but it looked ugly, so I lined it up with leading zeros. It was a mistake, because 0****** number are octal in C, so very strange things started to happen on the screen. It took a few hours to debug this and after that I was literally banging my head into the desk.
Unfortunately, I spend once, 3 months on a single bug. It was very long and I got desperate about it.
It was with a teleoperation application with a universal robot. The robot demo worked correctly in our office but not when the commercial did the demo at UR. For some reason, one demo was working, not the other we didn't seem to have any communication between the robots. Well, the second demo was working for 2 to 3 minutes and then both robots stopped, and stay blocked. It took a very long time to solve this, because we didn't have the setup to reproduce the bug in our office and I was not in a good health condition too.
Finally, the problem was from our robot dll library. They were an update, I was not aware of, between both demos. A colleague decided that a division by 2, which was not documented, was not "supposed" to be necessary. So he had removed it from the real-time library without further notice and pushed it in production. The result was that communication between both robots was not running at the correct speed. It was 2 times slower, therefore the teleoperation was not possible. Several months later, the bug is still in production, because "it was not the problem". Well it solves mine actually and I had figures to prove it.
This happened some time ago, I am not in this company anymore.
I tend to be the sort to leave something and come back to it in a week, especially for intermittent issues that are hard to reproduce.
Though I did get a fun text the other day "why does this test not work in IE?" from my last job's devs... 1. because I stopped trying to do a hack to fix it and 2. because <linked them to the IEDriver GitHub issue where Selenium said making IE work was so low priority that they didn't care typing was broken>
Technically I probably spent 4 years with those tests working in IE barely half the time because of timing around typing. I solved it by getting a new job :P
3 days. I neglected to filter NASA API results on the backend, so when Curiosity took 14,000 photos of Mars on one day it crashed my site lol. I was new to Python/Flask so it took me a while to figure out-- it was as simple as using a limit method!
On or off for about a month.
A Quill editor was rendering incorrectly, where all the line breaks between paragraphs (
p
tags) are stripped and multiple paragraphs are stitched together, but only in some specific Vue components (a fact that I should've paid much more attention to). Tried and mix and match every single configuration (well, probably not but definitely over a dozen) I can find in their documentation and their Github repo. Finally, I even dived into the source code, but with little achievement.Finally, the cause was stupidly simple. In those components, paragraphs were styled with either
display: flex
ordisplay: inline
. I literally jumped and cheered at the moment, half celebrating my success and half laughing at myself. Nonetheless, a GREAT lesson learned. :)Ha! I have a painful one. Took me nearly a good 3 days to find it. (between tackling other stuff when I hit a wall)
Note: I'm in GMT timezone.
The company I work for has an analytics view which takes a deep dive into the analytics of media the company serve. In November 2019, we got a message from a client saying numbers from our excel download functionality don't match that of their internal systems.
The numbers started off fine, but then massively increase after an arbitrary date. (clue 1)
The client was on the west coast of America, we provide all our analytics in UTC time (clue 2)
The client had multiple occurrences where the analytics was wrong after the arbitrary data. (clue 3).
I didn't have a problem when getting the data. (clue 4).
The problem?
Daylight saving
Without going into specifics.. the problem was going back an hour and then calling .startOfDay() on that date meant we would end up with two days worth of data after daylight savings.
Painful to find...easy to fix.
It was years ago so I don't remember, but probably about a week.
My first job was a startup providing an e-commerce recommendation service, doing the frontend stuff, and we'd just added carousel functionality to our service, so you can see lots of recommendations. For some strange reason, adding the carousel also caused the page we put it on to sprout 6 feet of whitespace on the right side. There was no (visible) added content, the page just got a scrollbar that ran for a ways off to the right.
I wound up saving a local copy of the page, and removing scripts until I found the culprit. It turned out that the carousel we were using was one of those types that puts the content in a big ribbon div and moves that left and right within the window div, with
overflow: hidden
on the window so we didn't render the full ribbon. However, even if we couldn't see the ribbon, the client's quick-view library could, and slapped its own "Hover here to see the quick-view modal" buttons on everything in the ribbon. Again, this was years ago so I'm probably misremembering exactly how this worked- Because those buttons were outside of the ribbon, and absolutely positioned, they still took up render space, but because they were set to hide until the user hovered over the product image they were associated with, they were not visible on the page.IIRC the solution was to add some hack on the carousel so that it toggled the quick-view tags on hidden/revealed images and then called the quick-view function to do its business. Awful, but it worked.
Not specifically a bug, but hear me out.
Me and a colleague found out that we want to make a tool specific for our work. We're both into programming and seem to know our stuff, but is not our primary work tasks and we're not hired as such. Pretty easy stack tbh: sql and php. He did backend, I did frontend.
I set up sql-server locally with all the correct tables and got my colleague's code and started my tweaking.
At first I was having some issues with running the php-site directly through
php -S localhost:8000 .
and connecting to the database. Having some experience with programming in Python and knowing that a clean environment is the best environment, I thought why not just make a clean virtual machine with ubuntu server and XAMPP. Set it up with a NAT network adapter and forwarded ports from localhost to the VM. The I installed the newest Ubuntu Server and started coding on that instead.But I experienced the same issue.
Start DBeaver to check out the db, yep seems fine, the db and tables are all there and looks great. I have another go. Same issue.
As a dirty fix, I started coding directly on the staging/prod-server just to make sure that my changes are working as intended. They do, and gradually it crawls to a completion.
It's only after two months and about 200 commits later I realise that I never stopped the local sql-server running on my machine and changing the host and credentials to the sql-server.
It was the same database the whole time ๐คฆโโ๏ธ
I spent a week trying to figure out if I was doing something wrong or if I had found a genuine bug in WCF. My gosh it almost broke my spirit. I don't think it has been resolved yet.
github.com/microsoft/dotnet/issues...
Ewww, nasty. I too have spent waaay too long reading the source for WCF when things do not behave as documented/expected! Probably the longest was when investigating session leakage while using the WS-SecureConversation protocol. It seems absolutely nobody else in the world made that decision, and we probably shouldn't have either, but customers were now using it (30k+ of them) so we had no choice but to find & fix the leaks.. all told a rotating team of 3-4 people spent ~1 year (over a period of 6 months) finding all the ways customers could break stuff and patching up the server side...
Just before I retired, we had a plan to emulate the session aspects of the protocol, and I had a POC working which avoided actual server-side sessions, it employed JWTs to carry the security session data back and forth instead. This would have fixed a lot of problems with state management and scalability, I have no idea if it got implemented!
Recently 5 days, off and on between meetings. No stack trace, just a build that kept slowly moving along taking almost 1 hour until I tracked down the culprit: Emotion 10 and how it handles type definitions can slow TypeScript compilation to a crawl. I figured it out by looking for similarities between packages that were slow in a monorepo, then commented out code until I found what caused the slowness and got the build down from 45 minutes to less than 1 minute.
1 year.
TLDR; I wrote a GitHub Action using Docker and Bash without knowing a lot about both. Someone let me know it wasn't working for them so I spent probably an hour a month looking at it for about 6 months before forgetting about it.
Eventually, someone else opened a PR to fix it. Open Source FTW!
and now a link to that PR github.com/bdougie/invite-based-on...
Three weeks. This was when I first started out in web development. I fixed a bug that prevented the project from building in heroku but I kept pushing to the wrong git branch(I used git push instead of git push origin master). So when I pushed again to heroku it would fail over and over again. I have never made that mistake again.
In sum.: three weeks with 2-3 developers. It was a corrupted ponter in a medical device. The issue was really hard to reproduce and even harder to understand the root-cause. At the end we found a couple of threads which tried to release a pointer and only one implementation of those three threads was broken. And kt was a legacy codebase without support from the authors.
Depends on how you count, probably. Weeks to months I'd say, in terms of a quite peculiar persisetency bug bringing an application server to a screeching halt then and now for no obvious reason. Fixing this was rather trivial as soon as we actually understood what went wrong. ๐
1 week and a half, trying to overwrite a css rule from a .net core app causing styling issues on a child react-based app. I needed the help from another dev for 4 days until we get the fix. Man... that was a challenge at another lever for me.
Over the years time scales have shrunk for both release cycles as well as debugging, I started programing on hard real time systems, using assembly code. Then we would normally achieve one (occasionally two) release a year. The system would was expected to be in service for a minimum of 10 years.
I can think of one intermittent fault on an interface between two systems which took me nearly a decade to find. This wasn't continuous effort but I had at least three attempts at resolving it. By the time I started looking at the bug the system was already a legacy system with a replacement contracted through our competitor wa on its way. I was a Junior engineer known for having an aptitude for low level coding, so I was put to work. Not having access to one of the two systems I could only review the code and write a report.
Two or three years later out customer dug up my report and agreed we could have access to both system, the catch was I only had a single day on site at the opposite end of the country. The nature of intermittent faults when debugging is the fault will not occur, true to form the system ran perfectly for the whole day and not useful information was gained.
As luck would have it our competitor failed to deliver on promises and our legacy system got a life extension and was rehosted on new hardware. I lead the software effort and in went in to service at which point I left the project. Once in service the original intermittent fault came e back with avengeance, our customer was not happy. I go seconded back to help fix the issue, we enhanced our simulator to emulate the other system and started debugging. Eventually I found the issue which we traced to using the wrong entry point in an error recovery routine in the Real Time OS. The programmer some 22 years earlier had types a 5 instead of a 3. The junior engineer who modified the simulator for me was younger than the bug! Having fixed the bug I was reminded of the report I had written nearly 10 years before, which correctly pointed to the exact error routine at fault.
A couple of days; an issue with pointers in a C program.
It began as usual: the program dies with a segmentation fault, open it with the debugger to check where the fault happened and... the stack is a nonsensical mess. Ouch. This is not a good sign, stinks of dangling pointers or similar.
In cases like this the actual error can be anywhere and it could be necessary a veeery long time to find the actual bug. It turned out that there was a problem not with just a pointer, but with a pointer to pointer to pointer to ... deep three or four levels.
I am soooo happy that I now code in Ada and not in C anymore.
I don't have a means to know, but I recall one which would have been around 1 month, but most of that time was ignoring the bug.
I had just come on to the project, the bugs had been mentioned but were not something I could directly start investigating.
I had to build out my test infrastructure, with mocks of our integrated component. This meant reading 3rd party API docs and building the correct communication lines.
After all of this was worked out, replicating the bug was easy and being specific with the cause was just a matter of describing what the code was doing.
Being QA I sent it off for someone else to fix.
I recently had a bug in my message stack, which took a few days to isolate what was happening, then took 2-3 weeks to fix, as I had to build a new message stack.
I wrote about it in dev.to/mortoray/high-throughput-ga...
My game has encountered several major defects, usually in libraries or the browsers, which required a lot of effort to workaround.
Once I was working on a project that used ElasticSearch. I was changing some things in a list page and noticed that the results were pretty random.
After maybe 2 or 3 days trying to understand what's going on, I discovered my local ES config had the default cluster name and open in the network, so it automatically created a cluster with a colleague machine and I was seeing his data.
I don't think it was the bug I spent more time, but it's one I dont forget.
Now that I think about it, it's pretty funny, but it was definitely not at the time. ;)
I have written about it before, one of my longest bug hunts took over 2 weeks. En eventual fix was switching 2 lines of code.
Sure, it was not in our code base, but a somewhat minor bug in a 3rd party library. Which combined with an other 3rd party library, and another one, and the way our code base was set up... it basically made our software unusable.
There was an other bug, which I have no idea how much time I spend in debugging. This again was caused by a combination of 3rd party libraries in a given setup. In total I might have spend way more than 2 weeks to figure this one out. The main issue was that it was leaking memory quite slowly. It took more than a week before you would even notice it in monitoring. It eventually lead to attacking the bug in two libraries [1] [2]. I don't know how much time I spend on this as I tried to tackle it multiple times over a long period. It wasn't a really high priority issue.
And then there was a bug which plagued our software for a really long time and resulted in quite some P1 issues where we had to restart the server. This was years ago. It was caused by the bad way our software was set up with mixed technology, resulting in deadlocks which killed the whole system. At that point we were both using our own ORM and Hibernate, mixed with EJBs and other things in Spring. We were kind of in the middle of the transition from JEE and own ORM to a Spring based setup with Hibernate. Some entities were used in both ways. Fixing this was no trivial task. But eventually I figured out a workaround that could sustain us while we (slacked) in continuing the move to the newer technology. Again no idea how much time I spend on this. I also did not really fix the bug.
So my monster-queries library leaked memory and it took me I think 4 years before I fixed the bug, and by fixed the bug I had to create a new language using AST which I call querylet:
github.com/teacherseat/querylet
~2 months of trying to find a race condition in a bunch of goroutines. I had to create a docker image with the debugging bits included (dlv) to connect to it remotely.
15 days. It was just that I used class variables in places where I needed instance variables. This caused race conditions and thus those variables were updated from different threads (Sidekiq jobs). Yes, I was new to Ruby back then, but that was really a pain.
Probably close to a month.
I was working as an iOS developer at the time for a hardware company. The app I was working on could connect to the hardware device and you could configure it using my app over USB.
I was debugging why the connection would drop after sending too much data to the device. The protocol we were using, usbmux, was reverse engineered and turned into a C library (libusbmux). Troubleshooting that was moderately easy, but troubleshooting things on the iOS device was super difficult.
At the time Apple had not implemented network debugging, so I didn't have any way to actually debug my app while also having it connected to the device. It took SO MANY times of just logging out to a file until I finally caught a clue.
The entire protocol is TCP based, with a server on both end that handles sending the packets over USB. I noticed that the send buffers on the iOS device quickly filling up, despite the data reaching the hardware device. Knowing this I was able to finally identify that under certain situations libusbmux would not send TCP ACKs for received data, and the iOS device would think it never reached.
It took so long because:
1- I had no access to any actual debugging methods other than printing to a log file
2- Modifying the code on the hardware I was connecting to was very tedious and took forever (compiling the Linux kernel is slow!)
It also did not help that the people that implemented this feature had long since left the company, and the poor guy who wound up owning this feature just wanted nothing to do with it.
At one point I was just questioning my own sanity as I found myself, with the title of "Web Developer", working on an iOS app, debugging the TCP stack in the Linux kernel.
2 weeks straight. Streaming parallel programming in Rust. It was incredible once I got it working, though, it was for an image thumbnailer that would update the list of thumbnails in realtime. It could process thousands of images in only a couple seconds.
2 Weeks.
The bug was critical but the cause was extremely subtle. By the end, it wasn't just me trying to fix it but the whole team including the team lead. It was just a quirk in Angular.
We all learned something that day.
About 8 hours total across two days.
The problem? We were setting everything in our new self-hosted kubernetes cluster. Set up our CI/CD pipeline on it, but it wasn't able to build our stuff, specifically it failed to fetch stuff out to the internet.
The bug ended up been some missconfig MTU value somewhere between kubernetes, CI/CD, and the DinD used to build.
Two lines of config change later and everything is working fine
At some point I just decide it's a feature and not a bug anymore.
Or I remove the buggy feature and pretend like it never existed :).
I spent some good two or three days debugging an issue on a React Native app running on, at the time, a newly released iPhone X. A new feature the team was developing started to lag only on iPhone X devices.
I found out that there was a memory issue after using the search feature, but I was not finding out why it was happening. After removing everything, I found out a TextInput property was causing it. Removing it fixed the issue and it didn't impact on the app usability at all.
I don't remember if I found out why that property was causing the issue, tho.
3 days.
I was 3 months in in learning how to code. I offered. my. service to my cousin to build an app for his business. I used Creative Tim's Bootstrap Material Kit UI kit. I forgot. to load the Javascript script tag on the index.htm which led to the hamburger menu being buggy. I realized after 3 days that it was the script tag that was the culprit of the bug. ๐คฆ๐ปโโ๏ธ
My cousin ended up not pushing through with the deal due to financial issues aka I priced too high at $2k for a static site. ๐
Site is still live tho enzobbs420.netlify.app/
Oh man, the price is really high for the project. However, good job for completing it when you were a beginner :)
I used to work on some data processing software for a particular measuring device (is that vague enough for you? Sorry, I'm just being overly protective). I would get reports that the software was slowing down if it was used for days at a time. I would occasionally look into the issue, but it stuck around for months until one of our in-house users was able to show me the problem in vivo.
The problem itself was a memory leak, but for some reason I couldn't get that "ah ha!" moment until I saw it in context.
I guess I didn't spend all that long over all debugging the issue, but it played on my mind for all that time.
Longest I have spent is 3 days..
We were modernizing our reporting solution from Crystal Reports to SSRS and there was a formula written to calculate adjusted hours(CST / IST time difference, weekends, holidays etc..). This formula was written in COBOL with no documentation. I was trying to convert it in C#. Target was to get same values from both formula. Bug was in calculating weekend hours values and took me whole of 3 days to figure that out
Couple of days........
It was a horrible bug and the code was mixed between frontend & backend, with several scripts essentially running the same thing / interfering with each other and causing the issue.
It was an inherited OpenCart site.
It was about One month. We used a thridparty bpm engine. After a month we identified a memory leak. This Was at least possible with windbg and we identified the memory consume. So we identified that the dispoe doesnt disposed the interna resources...
Well, when I was working on the original version of Janky in Python, I was working for god remembers how long trying to figure out how to get properties to work.
And when making RuntDeale prototype in Py, it had a bit of an ordeal with how TKinter wanted to render things.
I don't know the actual number, but a few months. I fixed the bug after taking a 3 week break over the holidays. Came in on January 4th and fixed it in like an hour.
I have some nice embedded programming stories for ya:
Two old colleagues of mine spent about one week on a particular issue:
They were working on a SIP stack (for audio connections/sessions), when suddenly it stopped working completely. After one week it turned out that the PBX (kind of phone/SIP router) had blacklisted their device for too many failed calls)...
I have spent about 1,5 months on another issue with a driver for flash memory. TLDR: some bit in a settings register was not set/reset by our driver, so based on whether the device had used an older driver before it would work perfectly OR shift everything 1 byte.
The unfortunate part of embedded programming (at least back then) was that:
Later on we added profilers and proper debugging setups (and a hardfault handler that printed stacktraces. GAMECHANGER!)
Good old arm-none-eabi-gcc days :P
10 years. I still can't figure out why providing the wrong password on Windows login takes 1 minute to process
I'll let you know when I'm done ;)
Ages ago I had to debug an issue on production, a different kind of production as the app was installed in terminals across 3 giant supermarkets. I had to figure out why sometimes the game gives too many prices than anticipated. It took me 3 days, I would spend the entire night debugging at home then install the new version in the terminals the next morning at 7 am before opening hours and observe. finally found out that I was missing a crucial step which is to flush the cache for the data to be persisted. The app was written in Adobe btw
Over a year so far some some subtle, intermittent bugs involving lots of different parts of the system.
8 days ( around 5 hrs every day ) i couldn't get the Auto enumerated headings working in prosemirror
I've been debugging this PR's formatting bugs and Git errors for around a week ๐
Ouch, some of these stories are painful!
3 days, turned out there was a bug in PHP itself. We had to come up with some creative workarounds for that one.
For me, it was months.
I actually wrote about it:
The Magical Password
Jan Wedel ใป Aug 24 '18 ใป 3 min read
3 days before I gave up on the project
In total maybe 3-4 days, but that was spread out over the space of a couple of weeks trying to figure out why something was happening the way it was for specific users.
I tell you when I am done
I frequently spend a day only to sleep on it, realize I was focusing on the wrong thing, and fix it reasonably quickly ๐ It's amazing what stepping back can do to help you attack a problem!