In sum.: three weeks with 2-3 developers. It was a corrupted ponter in a medical device. The issue was really hard to reproduce and even harder to understand the root-cause. At the end we found a couple of threads which tried to release a pointer and only one implementation of those three threads was broken. And kt was a legacy codebase without support from the authors.
Used to do DevOps before they even called it that way: Linux. Python. Perl. Java. Docker. For fun and profit. CTO level generalist working for a mid-sized tech-centric company.
Dresden, Germany
Depends on how you count, probably. Weeks to months I'd say, in terms of a quite peculiar persisetency bug bringing an application server to a screeching halt then and now for no obvious reason. Fixing this was rather trivial as soon as we actually understood what went wrong. 🙂
Front-end developer since 2016. Focused on React with GraphQL while studying software architecture, design patterns, emotional intelligence, and leadership.
1 week and a half, trying to overwrite a css rule from a .net core app causing styling issues on a child react-based app. I needed the help from another dev for 4 days until we get the fix. Man... that was a challenge at another lever for me.
Three weeks. This was when I first started out in web development. I fixed a bug that prevented the project from building in heroku but I kept pushing to the wrong git branch(I used git push instead of git push origin master). So when I pushed again to heroku it would fail over and over again. I have never made that mistake again.
Over the years time scales have shrunk for both release cycles as well as debugging, I started programing on hard real time systems, using assembly code. Then we would normally achieve one (occasionally two) release a year. The system would was expected to be in service for a minimum of 10 years.
I can think of one intermittent fault on an interface between two systems which took me nearly a decade to find. This wasn't continuous effort but I had at least three attempts at resolving it. By the time I started looking at the bug the system was already a legacy system with a replacement contracted through our competitor wa on its way. I was a Junior engineer known for having an aptitude for low level coding, so I was put to work. Not having access to one of the two systems I could only review the code and write a report.
Two or three years later out customer dug up my report and agreed we could have access to both system, the catch was I only had a single day on site at the opposite end of the country. The nature of intermittent faults when debugging is the fault will not occur, true to form the system ran perfectly for the whole day and not useful information was gained.
As luck would have it our competitor failed to deliver on promises and our legacy system got a life extension and was rehosted on new hardware. I lead the software effort and in went in to service at which point I left the project. Once in service the original intermittent fault came e back with avengeance, our customer was not happy. I go seconded back to help fix the issue, we enhanced our simulator to emulate the other system and started debugging. Eventually I found the issue which we traced to using the wrong entry point in an error recovery routine in the Real Time OS. The programmer some 22 years earlier had types a 5 instead of a 3. The junior engineer who modified the simulator for me was younger than the bug! Having fixed the bug I was reminded of the report I had written nearly 10 years before, which correctly pointed to the exact error routine at fault.
I graduated in 1990 in Electrical Engineering and since then I have been in university, doing research in the field of DSP. To me programming is more a tool than a job.
A couple of days; an issue with pointers in a C program.
It began as usual: the program dies with a segmentation fault, open it with the debugger to check where the fault happened and... the stack is a nonsensical mess. Ouch. This is not a good sign, stinks of dangling pointers or similar.
In cases like this the actual error can be anywhere and it could be necessary a veeery long time to find the actual bug. It turned out that there was a problem not with just a pointer, but with a pointer to pointer to pointer to ... deep three or four levels.
I am soooo happy that I now code in Ada and not in C anymore.
Ha! I have a painful one. Took me nearly a good 3 days to find it. (between tackling other stuff when I hit a wall)
Note: I'm in GMT timezone.
The company I work for has an analytics view which takes a deep dive into the analytics of media the company serve. In November 2019, we got a message from a client saying numbers from our excel download functionality don't match that of their internal systems.
The numbers started off fine, but then massively increase after an arbitrary date. (clue 1)
The client was on the west coast of America, we provide all our analytics in UTC time (clue 2)
The client had multiple occurrences where the analytics was wrong after the arbitrary data. (clue 3).
I didn't have a problem when getting the data. (clue 4).
The problem?
Daylight saving
Without going into specifics.. the problem was going back an hour and then calling .startOfDay() on that date meant we would end up with two days worth of data after daylight savings.
Latest comments (60)
In sum.: three weeks with 2-3 developers. It was a corrupted ponter in a medical device. The issue was really hard to reproduce and even harder to understand the root-cause. At the end we found a couple of threads which tried to release a pointer and only one implementation of those three threads was broken. And kt was a legacy codebase without support from the authors.
Depends on how you count, probably. Weeks to months I'd say, in terms of a quite peculiar persisetency bug bringing an application server to a screeching halt then and now for no obvious reason. Fixing this was rather trivial as soon as we actually understood what went wrong. 🙂
1 week and a half, trying to overwrite a css rule from a .net core app causing styling issues on a child react-based app. I needed the help from another dev for 4 days until we get the fix. Man... that was a challenge at another lever for me.
Three weeks. This was when I first started out in web development. I fixed a bug that prevented the project from building in heroku but I kept pushing to the wrong git branch(I used git push instead of git push origin master). So when I pushed again to heroku it would fail over and over again. I have never made that mistake again.
Over the years time scales have shrunk for both release cycles as well as debugging, I started programing on hard real time systems, using assembly code. Then we would normally achieve one (occasionally two) release a year. The system would was expected to be in service for a minimum of 10 years.
I can think of one intermittent fault on an interface between two systems which took me nearly a decade to find. This wasn't continuous effort but I had at least three attempts at resolving it. By the time I started looking at the bug the system was already a legacy system with a replacement contracted through our competitor wa on its way. I was a Junior engineer known for having an aptitude for low level coding, so I was put to work. Not having access to one of the two systems I could only review the code and write a report.
Two or three years later out customer dug up my report and agreed we could have access to both system, the catch was I only had a single day on site at the opposite end of the country. The nature of intermittent faults when debugging is the fault will not occur, true to form the system ran perfectly for the whole day and not useful information was gained.
As luck would have it our competitor failed to deliver on promises and our legacy system got a life extension and was rehosted on new hardware. I lead the software effort and in went in to service at which point I left the project. Once in service the original intermittent fault came e back with avengeance, our customer was not happy. I go seconded back to help fix the issue, we enhanced our simulator to emulate the other system and started debugging. Eventually I found the issue which we traced to using the wrong entry point in an error recovery routine in the Real Time OS. The programmer some 22 years earlier had types a 5 instead of a 3. The junior engineer who modified the simulator for me was younger than the bug! Having fixed the bug I was reminded of the report I had written nearly 10 years before, which correctly pointed to the exact error routine at fault.
For me, it was months.
I actually wrote about it:
The Magical Password
Jan Wedel ・ Aug 24 '18 ・ 3 min read
3 days, turned out there was a bug in PHP itself. We had to come up with some creative workarounds for that one.
A couple of days; an issue with pointers in a C program.
It began as usual: the program dies with a segmentation fault, open it with the debugger to check where the fault happened and... the stack is a nonsensical mess. Ouch. This is not a good sign, stinks of dangling pointers or similar.
In cases like this the actual error can be anywhere and it could be necessary a veeery long time to find the actual bug. It turned out that there was a problem not with just a pointer, but with a pointer to pointer to pointer to ... deep three or four levels.
I am soooo happy that I now code in Ada and not in C anymore.
Ha! I have a painful one. Took me nearly a good 3 days to find it. (between tackling other stuff when I hit a wall)
Note: I'm in GMT timezone.
The company I work for has an analytics view which takes a deep dive into the analytics of media the company serve. In November 2019, we got a message from a client saying numbers from our excel download functionality don't match that of their internal systems.
The numbers started off fine, but then massively increase after an arbitrary date. (clue 1)
The client was on the west coast of America, we provide all our analytics in UTC time (clue 2)
The client had multiple occurrences where the analytics was wrong after the arbitrary data. (clue 3).
I didn't have a problem when getting the data. (clue 4).
The problem?
Daylight saving
Without going into specifics.. the problem was going back an hour and then calling .startOfDay() on that date meant we would end up with two days worth of data after daylight savings.
Painful to find...easy to fix.
10 years. I still can't figure out why providing the wrong password on Windows login takes 1 minute to process