We all know it:
Bugs. Are. Nasty.
I use Endtest to quickly create Automated Tests and execute them on the cross-browser cloud.
You should check out the docs.
Here are the nastiest bugs I've missed and the lessons they taught me:
1. The $0.00 price on all invoices
A few years ago, I was working for this online retail company.
Most of the flows there were automated.
When a user would place an order on the site, the products would be picked up from the warehouse and the invoice would be generated and printed automatically.
The bug crawled from under a change made by a colleague, in a completely different section of the platform.
We only tested around the change and then we released it.
They called us from the warehouse to tell us that all the printed invoices have $0.00 on them.
Surprisingly, no one got fired.
We didn't have automated functional tests back then, we only had unit tests.
I wish I had a tool like Endtest back then.
What lesson did I learn?
Testing around the change is only useful to find out if your change works as expected.
No one can predict how a change might affect other areas of the software.
Never release anything, without executing the regression test suites.
2. The phantom server
Another company: we were working on this functionality that could execute scheduled jobs for the users, pretty much like cron-as-a-service.
The users could pick a time for their scheduled job, we had a cron on our side that would run every minute, picking up the jobs that were scheduled for that specific time and executing them.
So, we decided to spin up a production clone, to test the release there.
After the release, users were complaining that their scheduled jobs were being executed twice.
We spent a few days looking through the entire code, we added patches with extra checks.
We even modified values from the server configuration!
Nothing fixed it.
I looked for similar issues on Stack Overflow, until I found an answer from someone saying that maybe it’s coming from a different instance.
“Yeah…right, wish it was that easy!”
And then it hit me: we forgot to disable the cron on the production clone…and we also forgot to take down the production clone.
What lesson did I learn?
Sometimes, the trees prevent you from seeing the forest.
That’s why you need a structured approach for testing.
Make a checklist, write some test cases, but make sure to do it before you actually start testing.
3. The million-dollar input
This other company that I worked for had contracts with all sorts of government agencies, contracts worth billions of dollars.
Each delivery phase, obviously, had a deadline.
The penalties for not delivering the software on time were significant.
The company tried to keep things as cheap as possible, with as few employees as possible; this lead to patches being added hours before the UAT sessions.
In case you’re not familiar with the term, User Acceptance Testing (UAT), also known as beta or end-user testing, is defined as testing the software by the user or client to determine whether it can be accepted or not.
These sessions were critical, consisting of guiding a lot of government big shots through the process of checking if the system works.
You had to book these weeks in advance, to find a window in their busy schedule.
If a significant issue was found, they wouldn’t sign the documents and the penalties would be applied.
Of course, we were smart enough to have automated tests and I was the one writing them.
As it was a web application, I just used Selenium and executed them the browsers from my own Windows machine, Internet Explorer included.
We also had a DevOps guy there, who insisted that the tests should be executed from our Jenkins and run there.
I didn’t even know that was possible, and he mumbled something about a headless Chrome browser on a Linux machine.
A patch was added, one day before the UAT.
We were confident, because the automated tests were checking that area, the full regression didn’t reveal any issues.
First day of the UAT: everyone noticed an issue where a form couldn’t be submitted, due to some broken input.
Our response?
Just clear your browser cache.
Still not working.
Let me check.
Oh, this is bad.
That penalty was close to one million dollars.
Good news is that the company was prepared for that possibility from the start, so it wasn’t a tragedy.
What lesson did I learn?
Always test in real conditions. Always.
If you’re going to test on a headless browser on some Linux machine, you are taking a risk.
If you are using some cloud solution providing a Chrome browser without telling you which OS it runs on, it’s probably a headless browser on some Linux machine.
At the company I currently work for, we use Endtest.
It provides real browsers on Windows and macOS machines.
Honestly, it does help me sleep at night.
I wouldn’t know if it’s the right tool for you, but for us it did wonders.
What are the ugliest bugs YOU missed?
I’m looking forward to hearing other stories.
Top comments (6)
We had this bug with Google IMA on Android tv's. No matter which config we passed, the SDK was always picking the worst quality ad video. After looking at the minified code, turns out that, if the navigator user agent matched Android, it was doing an array shift and removing the high quality video. Took me a while xD
Thank you for sharing that story!
Never as bad as yours but was dipping in and out of a project and trying to work with the logic of at least 7 independent developers and constantly having to re-implement paypal because for some reason they loved changing it which led to chasing bugs out of the whole thing. It cascaded everytime!!
A few weeks ago the client asked me to do some more work because something was broken I flipped the console and yet again someone changed the paypal again, I said no way mate 😂😂
Thank you for sharing that story.
My heart starts racing only when I read about a bug in a payment section.
I guess you have some kind of advanced integration with PayPal, not just the "Buy now" button.
This is the sad part but no it was just buttons, this and the authentication for firebase was the two things they kept playing with even though they worked, once it was a xss attempt I don't know why they was billing him to touch parts thats was done then do some work elsewhere, the whole environment was volotile and needed a rebuild but his one of those clients.....
LSEP.