A Battle-Scarred Senior SDET's Perspective
When I first moved into test automation, I thought I was doing well.
I could write clean code. I built frameworks I felt proud of. Most of my pipelines were green. I was even catching the occasional bug before customers ever saw it. At the time, that felt like success.
Looking back, that view was narrow. I measured myself by output instead of impact. I focused on what I could build, not on how that work helped teams ship safely or make better decisions. "Catching bugs" became the goal, rather than understanding risk.
Over time though, I saw the same problems repeat across teams, tools, and companies. The details changed. The patterns did not.
Automation is not about tests that pass. It is about helping teams understand risk and and ship with confidence. Tests are just one slice of a larger system and if you only focus on the test results, you'll miss the bigger picture entirely like I did early on.
With that in mind, these are the lessons I wish I had learned sooner.
Strong coding skills are necessary, but not sufficient
My path into QA was highly unconventional.
I started as a full-stack developer after graduating with a computer science degree. My mindset was code first, code always. I fell into QA almost by accident. Let's grab a coffee sometime and trade career stories. I'm grateful it happened, but early on, my identity lived entirely in the code. I focused almost obsessively on writing elegant test frameworks. Clean architecture. Reusable helpers. Beautiful setup APIs. Everything pristine. If the framework looked slick, I felt successful.
Now don't get me wrong, that work matters. Strong coding skills are required for test suites that scale, stay readable, and survive change. You cannot build serious automation without them. But code should not be the first focus. It is a tool, not the goal.
Strong code can hide weak thinking, and nobody ever taught me how to think about quality. That is not something you learn in school.
You can write clean tests and still miss the failures that hurt users. Effective testing starts with understanding:
- risk
- system behavior
- where failures are likely to occur
- business impact
When you shift from solely focusing on writing code to thinking about the true failure, your impact changes. The code supports the thinking, not the other way around. This is why I believe so many developers struggle to write effective automation tests. They can code just fine, but the thinking can be a bit shallow. And when the thinking lacks depth, you end up with a very nice test that ultimately delivers no business value.
It took me a long time to learn this. I still take pride in my technical skills, but I learned the hard way that strong automation starts with a strong quality mindset. No amount of clever code can compensate for the absence of it.
How flaky tests are handled reflects leadership
The first time someone said "just rerun it," I didn't push back.
It was "one of those tests." A little finicky. We joked that it was having a bad day and needed more coffee. Rerun it, get green, move on. It seems so insignificant at the time but over the years I've realized this mindset is quite damaging.
When teams accept flakes, failures lose meaning. Engineers stop trusting results. Automation turns into noise. You still run the tests, but you stop taking them seriously.
Let's be clear, flaky tests do not reflect weak engineers. They reflect weak decisions about ownership and priorities. Flake will always exist. You cannot eliminate it completely. It is a by-product of distributed systems, async behavior, environments, data, and timing.
Leadership is not just about preventing flake. It is about how you respond when it happens.
Ignoring flakes, hiding them, or rerunning until green are leadership choices. Fixing them requires explicit ownership and real trade-offs. The moment you decide to tolerate noise in your test suite is the moment you accept a brittle quality culture.
At a senior level, this becomes your responsibility. Flakiness must be treated like any other defect, with root cause, follow-up, and prevention. You must protect your green runs and address any failure, of any type, with urgency. Otherwise, over time, if you don't enforce this, you slowly lower your quality bar without realizing it.
This may sound extreme, but pause and think about it. I genuinely believe flaky tests are worse than having no tests at all. Noise distracts teams from work that matters. Being complacent with flakes is a subtle form of accepting failure. One path is easy. The other leads to long-term reliability and trust.
CI pipelines matter as much as the tests
I didn't realize it early in my career but how your tests run is just as important as how they are written.
You can have the most perfectly stable test that runs locally and passes 100/100 times but if you don't put the same effort and thinking into the pipeline, what you will end up seeing is:
- slow execution
- unclear failures
- unreliable test results
Now all of a sudden, a test suite that was valuable becomes noise and a giant liability.
And it only get worse. When pipelines are slow or confusing, teams stop using automation as a decision tool, which is their entire purpose. Now something that used to be valuable turns into a never-ending abyss of tech debt. Tests must be fast and predictable and I learned this the hard way.
In my first role as an SDET, I focused on building a strong local test suite from scratch. Within a month, I had around 50 E2E tests covering most of the application's functionality. They looked great locally. They passed reliably. When I finally hooked them up to CI, everything fell apart. The same tests were now running against different environments, different data, and different performance factors. Instead of building confidence, I suddenly had 50 unstable tests and no pipeline maturity to support them. I ended up retroactively putting out 50 fires instead of starting with a small, reliable subset and building the pipeline correctly from the beginning.
I wish I had spent more time early on learning CI/CD fundamentals and pipeline best practices. Turning a local test suite into a reliable pipeline powerhouse takes skill, and it's a skillset I underestimated for far too long.
Automation is a product and maintenance is the work
This lesson took me longer than I care to admit.
If you start to think about your automation test suite from the same perspective as any product, something starts to change. You quickly realize that you have users just like your business has customers. In this case, those users are developers.
Once you accept that your automation has users, you start designing for them. Readability matters. Supportability matters. Stability matters. Test code is not second-class code. It runs in production pipelines, blocks releases, and influences real decisions. Treating it as "just test code" is how suites rot over time.
Just like a real product, if it is broken, unreliable, or constantly lying, nobody will use it. Treat automation like a real product:
- define ownership
- classify failure types and severities
- track stability and runtime
- plan improvements over time with ticketing and planning
Production systems have roadmaps, metrics, and refactoring cycles. Automation needs the same discipline to stay useful.
In QA, we expect the product to meet a high quality bar. If we do not hold our own automation to the same standard, what message are we sending?
Debugging is a skill that must be built on purpose
Debugging is rarely taught.
Most people learn it retroactively through repetition, frustration and mistakes. At this point in my career, troubleshooting is one of the biggest gaps I see between junior and senior engineers.
Debugging is a mindset. While experience helps, you do not need decades to improve it. You can reshape how you debug today.
I like to think of debugging as being equivalent to solving a murder mystery. As a detective, is it easier to solve the case with a hundred suspects or three? The same is true with test failures. How you limit suspects…or variables, matters. This is why starting from a known-good baseline is invaluable. If everything is already broken, you are not debugging. You are firefighting.
Senior engineers practice debugging intentionally. They reflect on their approach. They form hypotheses, rule things out, and trace failures across layers. Debugging deserves the same respect as framework design or test strategy and it's often overlooked until the moment it's needed.
Final thought
It took me years to learn these lessons. In some ways, I'm still learning them.
I've come to appreciate that feeling. It means my assumptions are still being tested. My thinking is still evolving. My feedback loop is still alive.
In a way, it's my own continuous integration. Every failure teaches me something. Every correction makes the system a little stronger.
If I could go back, I wouldn't tell myself to learn another tool.
I mean, I still would. Tools are fun.
But it wouldn't be my first priority.
I would say:
- learn how systems fail
- learn how teams make decisions
- learn how to influence without authority
- learn how to think in "quality" rather than thinking in test results
Technical skills grow with time. Impact comes from everything around them.
If you are early in your career and reading this, you are doing better than you think. Feeling uncertain often means you are learning faster than your confidence can keep up.
Even senior engineers feel it.
Especially the ones who say they do not.
Remember, good automation doesn't just catch bugs. It helps teams make better decisions.
Happy testing and learning.
Top comments (0)