<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Tudor Brad</title>
    <description>The latest articles on DEV Community by Tudor Brad (@tudorsss-betterqa).</description>
    <link>https://dev.to/tudorsss-betterqa</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3869055%2Ffed6c014-e6c6-43ea-833a-18fa21d3158d.png</url>
      <title>DEV Community: Tudor Brad</title>
      <link>https://dev.to/tudorsss-betterqa</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tudorsss-betterqa"/>
    <language>en</language>
    <item>
      <title>Fuzz testing found bugs in our API that unit tests never would</title>
      <dc:creator>Tudor Brad</dc:creator>
      <pubDate>Thu, 09 Apr 2026 08:34:50 +0000</pubDate>
      <link>https://dev.to/tudorsss-betterqa/fuzz-testing-found-bugs-in-our-api-that-unit-tests-never-would-1a39</link>
      <guid>https://dev.to/tudorsss-betterqa/fuzz-testing-found-bugs-in-our-api-that-unit-tests-never-would-1a39</guid>
      <description>&lt;p&gt;I used to think our test suites were solid. We had unit tests, integration tests, contract tests for the API layer. Good coverage numbers. The kind of setup that makes you feel safe when you merge to main on a Friday afternoon.&lt;/p&gt;

&lt;p&gt;Then we ran a fuzzer against the same API and watched it fall apart in under an hour.&lt;/p&gt;

&lt;p&gt;Fourteen crashes. Server panics on malformed JSON. A file upload endpoint that accepted literally anything as long as you set the right Content-Type header. An input field on a form that crashed the entire backend process when it received a float instead of an integer.&lt;/p&gt;

&lt;p&gt;None of these showed up in our existing tests. Not one.&lt;/p&gt;

&lt;p&gt;That was the day I stopped treating fuzzing as a "nice to have" and started treating it as the part of security testing that actually finds the bugs hiding between your test cases.&lt;/p&gt;

&lt;h3&gt;
  
  
  What fuzzing actually does
&lt;/h3&gt;

&lt;p&gt;Fuzzing is simple in concept. You throw garbage at your software and see what breaks.&lt;/p&gt;

&lt;p&gt;More precisely: you take valid inputs, mutate them in thousands of ways (wrong types, oversized strings, null bytes, nested objects 500 levels deep, unicode edge cases, truncated payloads), and send them at your application as fast as you can. Then you watch for crashes, hangs, memory leaks, unexpected error codes, and data that leaks out in error messages.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://owasp.org/www-community/Fuzzing" rel="noopener noreferrer"&gt;OWASP fuzzing page&lt;/a&gt; describes the technique well if you want the textbook version. But here is what it looks like in practice: you point a tool at an endpoint, go make coffee, and come back to a list of inputs that made your software do something it should not have done.&lt;/p&gt;

&lt;p&gt;The reason this works so well is that developers test for what they expect. You write a test that sends valid JSON and checks the response. Maybe you write a test that sends empty JSON and checks for a 400 error. But you probably do not write a test that sends JSON with a key that is 50,000 characters long, or a nested array 200 levels deep, or a number where a string should be with a trailing null byte.&lt;/p&gt;

&lt;p&gt;Fuzzers do not have expectations. They just try things. And software has a lot of assumptions baked into it that only surface when those assumptions get violated.&lt;/p&gt;

&lt;h3&gt;
  
  
  The bugs fuzzing catches that nothing else does
&lt;/h3&gt;

&lt;p&gt;Let me walk through the actual categories of failures we find during fuzz testing engagements. These are real patterns from real projects.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Input type confusion.&lt;/strong&gt; A registration form expects a string for the phone number field. The API handler parses it and passes it to a validation function that calls &lt;code&gt;.match()&lt;/code&gt; on it. Send an integer instead of a string and the backend throws an unhandled TypeError. The server returns a 500 with a stack trace that includes the file path and line number. Now an attacker knows your framework, your file structure, and exactly where to probe next.&lt;/p&gt;

&lt;p&gt;Unit tests rarely cover this because the developer wrote the test with the same mental model they used to write the code. They send a string because that is what the field is for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Malformed JSON handling.&lt;/strong&gt; We see this constantly. APIs that parse JSON request bodies without validating the structure first. Send &lt;code&gt;{"user": {"name": {"name": {"name": ...}}}}&lt;/code&gt; nested 100 times and the server either runs out of memory or hits a recursion limit and crashes. Send JSON with a trailing comma (technically invalid) and some parsers accept it while others throw. Send a 10MB payload to an endpoint that expects 200 bytes and there is no size limit enforced.&lt;/p&gt;

&lt;p&gt;These are not exotic attacks. They are basic robustness issues that every public-facing API should handle. Fuzzers find them in minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;File upload validation gaps.&lt;/strong&gt; This one is a classic. An endpoint says it accepts PNG files. It checks the Content-Type header. It does not check the actual file content. So you can upload a PHP script, a shell script, or an SVG containing embedded JavaScript, and the server happily stores it. Depending on the server configuration, that file might be directly executable.&lt;/p&gt;

&lt;p&gt;We tested a client's document upload feature and found that it validated the file extension in the filename but not the actual bytes. Rename &lt;code&gt;malicious.php&lt;/code&gt; to &lt;code&gt;malicious.php.png&lt;/code&gt; and it went straight through.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Error message information leakage.&lt;/strong&gt; When software crashes on unexpected input, the error messages often contain information that should never reach the client. Database connection strings, internal IP addresses, full stack traces with dependency versions, SQL query fragments. Fuzzers trigger these crashes systematically, and each crash response becomes a reconnaissance opportunity for an attacker.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Integer overflows and boundary values.&lt;/strong&gt; We worked on a payment processing system where fuzz testing found an integer overflow in the transaction amount field. The field was a 32-bit signed integer. Send a value just past &lt;code&gt;2,147,483,647&lt;/code&gt; and the system wrapped around to a negative number. In a payment context, that could mean a credit instead of a debit. Standard tests sent amounts like 100, 500, 10000. Nobody tested what happens at the boundary of the data type itself.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why your existing tests miss these
&lt;/h3&gt;

&lt;p&gt;Your unit tests are written by the same people who wrote the code. They share the same assumptions about what valid input looks like. They test the happy path and a handful of known error cases.&lt;/p&gt;

&lt;p&gt;Your integration tests verify that components work together correctly when given correct data. They rarely test what happens when component A sends garbage to component B.&lt;/p&gt;

&lt;p&gt;Your end-to-end tests simulate real user behavior. Real users do not typically paste 50,000 characters into a phone number field or send raw bytes to a JSON endpoint. Attackers do.&lt;/p&gt;

&lt;p&gt;Fuzzing fills the gap between "does it work correctly?" and "does it fail safely?" Those are two very different questions, and most test suites only answer the first one.&lt;/p&gt;

&lt;h3&gt;
  
  
  How we actually run fuzz tests
&lt;/h3&gt;

&lt;p&gt;At &lt;a href="https://betterqa.co" rel="noopener noreferrer"&gt;BetterQA&lt;/a&gt;, fuzzing is part of our DAST (Dynamic Application Security Testing) work. We built an &lt;a href="https://betterqa.co/software-testing-services/" rel="noopener noreferrer"&gt;AI Security Toolkit&lt;/a&gt; with over 30 scanners, and fuzzing is integrated into the dynamic analysis pipeline.&lt;/p&gt;

&lt;p&gt;Here is how a typical engagement works:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Map the attack surface.&lt;/strong&gt; Before we fuzz anything, we need to know what exists. We crawl the application, identify all endpoints, document the expected input formats, and note which endpoints handle sensitive data (auth, payments, file uploads, admin functions).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Seed the fuzzer with valid inputs.&lt;/strong&gt; Good fuzzing starts with valid data. We capture real requests from the application (with test accounts, never production data), and the fuzzer uses these as templates. It knows what a valid request looks like, so it can make targeted mutations rather than purely random noise.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Run mutation-based fuzzing.&lt;/strong&gt; The fuzzer takes each valid input and generates thousands of variants. Wrong types, boundary values, encoding tricks, oversized payloads, special characters, null bytes, format string patterns. Each variant gets sent to the endpoint, and we capture the response code, response body, response time, and any server-side logs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Triage the findings.&lt;/strong&gt; Not every crash is a security vulnerability. Some are just robustness issues (the server returns a 500 but recovers cleanly). Some are actual security holes (the server leaks data, accepts the malformed input as valid, or enters an inconsistent state). We classify each finding by severity and exploitability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Verify and document.&lt;/strong&gt; Every finding gets manually verified. We reproduce the crash, confirm the root cause, and write up the fix. No false positives in the final report.&lt;/p&gt;

&lt;p&gt;For web applications, we often use OWASP ZAP as one of the tools in this pipeline. For APIs, we combine custom fuzzing scripts with tools like Burp Suite's Intruder or purpose-built API fuzzers. For projects with unusual protocols (IoT devices, custom binary formats), we write targeted fuzzers from scratch.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to fuzz (and when not to)
&lt;/h3&gt;

&lt;p&gt;Fuzzing works best when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You have a public-facing API that accepts user input&lt;/li&gt;
&lt;li&gt;You process file uploads&lt;/li&gt;
&lt;li&gt;You handle payment or financial data&lt;/li&gt;
&lt;li&gt;You parse complex data formats (JSON, XML, CSV, binary protocols)&lt;/li&gt;
&lt;li&gt;You have already done basic security testing and want to go deeper&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fuzzing is less useful when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The application has no external input surface (purely internal batch processing)&lt;/li&gt;
&lt;li&gt;You have not done basic input validation yet (fix the obvious stuff first, then fuzz)&lt;/li&gt;
&lt;li&gt;The codebase changes so frequently that findings become stale before they are fixed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The best time to start fuzzing is after your first round of functional testing is stable but before you go to production. That is when the cost of fixing issues is lowest and the risk of missing something is highest.&lt;/p&gt;

&lt;h3&gt;
  
  
  The security testing reality in 2024
&lt;/h3&gt;

&lt;p&gt;As Tudor Brad, BetterQA's founder, puts it: "It's a good versus evil game right now." AI is accelerating development speed, which means more code ships faster, which means more potential vulnerabilities reach production faster. Features that used to take months now take days. The testing has to keep pace.&lt;/p&gt;

&lt;p&gt;Fuzzing is one of the few techniques that scales with code output. You do not need to manually write a test case for every possible malformed input. The fuzzer generates them. You just need to point it at the right targets and have someone who knows what they are looking at to triage the results.&lt;/p&gt;

&lt;p&gt;If you have never run a fuzzer against your application, I would strongly suggest trying it on a staging environment. The results will probably surprise you. We have yet to fuzz a non-trivial application and find zero issues. Every single engagement has turned up something the existing test suite missed.&lt;/p&gt;

&lt;p&gt;The question is never "does my software have these bugs?" The question is "do I find them before someone else does?"&lt;/p&gt;

&lt;p&gt;More on security testing and QA practices on the &lt;a href="https://betterqa.co/blog" rel="noopener noreferrer"&gt;BetterQA blog&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>testing</category>
      <category>ai</category>
      <category>security</category>
      <category>devops</category>
    </item>
    <item>
      <title>Payment testing: the card types that break in production</title>
      <dc:creator>Tudor Brad</dc:creator>
      <pubDate>Thu, 09 Apr 2026 08:34:46 +0000</pubDate>
      <link>https://dev.to/tudorsss-betterqa/payment-testing-the-card-types-that-break-in-production-5c1d</link>
      <guid>https://dev.to/tudorsss-betterqa/payment-testing-the-card-types-that-break-in-production-5c1d</guid>
      <description>&lt;h3&gt;
  
  
  The bug that costs you money twice
&lt;/h3&gt;

&lt;p&gt;Last year we tested a fintech client's checkout flow. Everything passed in Stripe test mode. Green across the board. Then they went live in Germany and 30% of transactions started failing silently. No error page. No retry prompt. Just... nothing happened when the user clicked "Pay."&lt;/p&gt;

&lt;p&gt;The problem was 3D Secure. Their integration handled the initial charge request fine, but never implemented the redirect flow for SCA (Strong Customer Authentication). In test mode, Stripe skips 3D Secure unless you explicitly use the &lt;code&gt;4000002760003184&lt;/code&gt; test card. Nobody on the dev team had used that card. So nobody knew the integration was broken for every European card that required authentication.&lt;/p&gt;

&lt;p&gt;The client found out when chargebacks started hitting. That is the worst way to discover a payment bug: your payment processor tells you, your bank tells you, and your users have already left.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why payment bugs are different from other bugs
&lt;/h3&gt;

&lt;p&gt;A broken image on your landing page is embarrassing. A broken payment flow is expensive. Here is what makes payment bugs uniquely painful:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Direct revenue loss.&lt;/strong&gt; Every failed transaction is money that almost entered your account and didn't. If 5% of your transactions fail due to a card type you never tested, that is 5% of revenue gone. Not "at risk." Gone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chargebacks compound the damage.&lt;/strong&gt; When a payment goes through incorrectly (wrong amount, duplicate charge, currency mismatch), you don't just refund the money. You pay chargeback fees. Enough chargebacks and your payment processor raises your rates or drops you entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;User trust evaporates instantly.&lt;/strong&gt; People are anxious about money. A single failed payment makes a user question whether your site is legitimate. They won't debug it for you. They will close the tab and buy from someone else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Silent failures hide the problem.&lt;/strong&gt; Unlike a 500 error that shows up in your monitoring, many payment failures happen at the processor level and return a generic decline. Your logs show "card_declined" but the real cause is that your integration doesn't handle the card network correctly.&lt;/p&gt;

&lt;p&gt;This is why we treat payment testing as its own discipline, not just "form validation with a credit card field."&lt;/p&gt;

&lt;h3&gt;
  
  
  Card types that actually break things
&lt;/h3&gt;

&lt;p&gt;Here are the specific card type issues we run into repeatedly when testing payment integrations for clients.&lt;/p&gt;

&lt;h3&gt;
  
  
  Amex and the 15-digit problem
&lt;/h3&gt;

&lt;p&gt;American Express cards have 15 digits and a 4-digit CVV (called CID). Visa and Mastercard have 16 digits and a 3-digit CVV. This sounds trivial until you see how many integrations hardcode &lt;code&gt;maxLength="16"&lt;/code&gt; on the card number input and &lt;code&gt;maxLength="3"&lt;/code&gt; on the CVV field.&lt;/p&gt;

&lt;p&gt;We tested a SaaS platform where Amex cards were being silently rejected. No error message. The form just wouldn't submit. The frontend validation required exactly 16 digits, so any 15-digit PAN was treated as incomplete. The user saw a disabled submit button and assumed they typed something wrong.&lt;/p&gt;

&lt;p&gt;Test cards to use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Amex:           3782 822463 10005    (15 digits, 4-digit CID)
Visa:           4242 4242 4242 4242  (16 digits, 3-digit CVV)
Mastercard:     5555 5555 5555 4444  (16 digits, 3-digit CVV)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What to check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Card number field accepts 15, 16, and 19 digits&lt;/li&gt;
&lt;li&gt;CVV field accepts both 3 and 4 digits&lt;/li&gt;
&lt;li&gt;Card type detection updates dynamically (Amex logo appears when you type &lt;code&gt;37xx&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Backend validation matches frontend rules&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  UnionPay and 19-digit PANs
&lt;/h3&gt;

&lt;p&gt;UnionPay cards can be 16, 17, 18, or 19 digits long. If your validation regex is &lt;code&gt;^\d{16}$&lt;/code&gt;, you are rejecting a card network used by over a billion people.&lt;/p&gt;

&lt;p&gt;We see this constantly in integrations targeting Asian markets. The dev team builds and tests with Visa/Mastercard, launches in Singapore or Malaysia, and gets support tickets from users who "can't enter their card number."&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;UnionPay (19):  6200 0000 0000 0000 003
UnionPay (16):  6200 0000 0000 0005
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fix is straightforward: accept 13-19 digits and let the payment processor handle network-specific validation. Your frontend should not be the gatekeeper for PAN length.&lt;/p&gt;

&lt;h3&gt;
  
  
  Diners Club and the 14-digit edge case
&lt;/h3&gt;

&lt;p&gt;Diners Club cards traditionally have 14 digits, though newer ones may have 16. If your system strips spaces and then checks &lt;code&gt;length === 16&lt;/code&gt;, Diners Club users cannot pay.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Diners Club:    3056 9309 0259 04   (14 digits)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This one is less common globally but still matters if you operate in parts of South America or accept corporate cards. We have seen it break on subscription billing platforms where the initial charge worked (the card was tokenized by Stripe directly) but a later recurring charge failed because the platform's own validation ran during a card update flow.&lt;/p&gt;

&lt;h3&gt;
  
  
  3D Secure and SCA failures
&lt;/h3&gt;

&lt;p&gt;This is the big one. 3D Secure (3DS) adds an authentication step where the card issuer verifies the cardholder, usually through a redirect or iframe popup. In the EU, SCA regulations make this mandatory for most online transactions.&lt;/p&gt;

&lt;p&gt;The problem: Stripe's test mode does not trigger 3DS by default. You need to explicitly use test cards that simulate the 3DS flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;3DS required:       4000 0027 6000 3184
3DS required (fail): 4000 0084 0000 1629
3DS optional:       4000 0025 0000 3155
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What breaks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The redirect URL is not configured, so the user gets sent to a blank page&lt;/li&gt;
&lt;li&gt;The return handler does not check &lt;code&gt;payment_intent.status&lt;/code&gt; after the redirect&lt;/li&gt;
&lt;li&gt;Mobile webviews block the 3DS popup, so the authentication never completes&lt;/li&gt;
&lt;li&gt;The webhook handler does not account for the &lt;code&gt;requires_action&lt;/code&gt; status&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We tested a client's mobile app where 3DS worked perfectly in the browser but failed 100% of the time in the iOS webview. The app's &lt;code&gt;WKWebView&lt;/code&gt; had &lt;code&gt;javaScriptEnabled&lt;/code&gt; set to &lt;code&gt;true&lt;/code&gt; but blocked popups, which is how the 3DS challenge was presented. Every EU user on iOS could not complete a payment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Currency and amount edge cases
&lt;/h3&gt;

&lt;p&gt;Currency bugs are sneaky because they often produce a valid charge for the wrong amount. The user gets billed, the amount looks plausible, and nobody notices until reconciliation.&lt;/p&gt;

&lt;p&gt;Common issues we test for:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zero-decimal currencies.&lt;/strong&gt; JPY, KRW, and several others do not use decimal subunits. If your system sends &lt;code&gt;1000&lt;/code&gt; to Stripe for a 10.00 USD charge (correct, because Stripe uses cents), sending &lt;code&gt;1000&lt;/code&gt; for a JPY charge means 1000 yen, not 10 yen. The amount field interpretation changes by currency.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# USD: $10.00 = 1000 (cents)
# JPY: 1000 yen = 1000 (no subunit)
# BHD: 10.000 BD = 10000 (three decimal places)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Rounding on conversion.&lt;/strong&gt; If your platform shows prices in EUR but charges in USD after conversion, rounding differences can mean the user sees 9.99 EUR but gets charged 10.01 EUR equivalent. Small difference. Big trust problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Minimum charge amounts.&lt;/strong&gt; Stripe requires a minimum of 50 cents USD (or equivalent). If your platform allows a 0.10 USD tip or a discount that reduces the charge below the minimum, the payment fails at the processor level with a generic error.&lt;/p&gt;

&lt;h3&gt;
  
  
  How we structure payment test suites
&lt;/h3&gt;

&lt;p&gt;When we pick up a payment integration project, here is the sequence we follow. This is not theory. This is what we actually run.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 1: Card type coverage matrix.&lt;/strong&gt; We build a grid of every card network the client wants to support, crossed with every payment scenario (one-time charge, subscription, refund, partial refund, card update). Each cell gets tested. No assumptions that "if Visa works, Mastercard works."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 2: Authentication flows.&lt;/strong&gt; We test every 3DS path: success, failure, abandonment (user closes the popup), timeout, and network error during redirect. We test on desktop browsers, mobile browsers, and in-app webviews separately because they behave differently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 3: Error handling and messaging.&lt;/strong&gt; We trigger every decline code Stripe can return (insufficient funds, expired card, incorrect CVV, processing error, card not supported) and verify the user sees a specific, actionable message. "Payment failed" is not acceptable. "Your card was declined. Please check your card details or try a different payment method" is the minimum.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 4: Webhook reliability.&lt;/strong&gt; We verify that payment confirmation does not depend solely on the client-side redirect. If the user closes their browser after 3DS but before the redirect completes, the webhook from Stripe should still update the order. We test this by intentionally killing the browser session mid-payment and confirming the backend processes the webhook correctly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 5: Currency and locale.&lt;/strong&gt; We test with cards issued in different countries, in different currencies, with different locale settings on the browser. A Japanese user with a JPY card on a platform that prices in USD should see a coherent experience from price display through to their bank statement.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stripe test cards quick reference
&lt;/h3&gt;

&lt;p&gt;For developers setting up their own payment test suites, here are the cards we use most often:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Card number&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Success&lt;/td&gt;
&lt;td&gt;&lt;code&gt;4242 4242 4242 4242&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Always succeeds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Generic decline&lt;/td&gt;
&lt;td&gt;&lt;code&gt;4000 0000 0000 0002&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Always declined&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Insufficient funds&lt;/td&gt;
&lt;td&gt;&lt;code&gt;4000 0000 0000 9995&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Specific decline reason&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Incorrect CVC&lt;/td&gt;
&lt;td&gt;&lt;code&gt;4000 0000 0000 0127&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;CVC check fails&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Expired card&lt;/td&gt;
&lt;td&gt;&lt;code&gt;4000 0000 0000 0069&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Expiry check fails&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3DS required&lt;/td&gt;
&lt;td&gt;&lt;code&gt;4000 0027 6000 3184&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Triggers authentication&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3DS failure&lt;/td&gt;
&lt;td&gt;&lt;code&gt;4000 0084 0000 1629&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Authentication fails&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Amex&lt;/td&gt;
&lt;td&gt;&lt;code&gt;3782 822463 10005&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;15 digits, 4-digit CID&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dispute/chargeback&lt;/td&gt;
&lt;td&gt;&lt;code&gt;4000 0000 0000 0259&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Triggers dispute&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Use any future expiry date and any 3-digit CVC (4-digit for Amex). For full documentation, check &lt;a href="https://docs.stripe.com/testing" rel="noopener noreferrer"&gt;Stripe's testing page&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The test mode trap
&lt;/h3&gt;

&lt;p&gt;Here is the pattern we see over and over: a team builds a payment integration, tests it thoroughly in Stripe test mode, and ships it. Then production breaks in ways that test mode never revealed.&lt;/p&gt;

&lt;p&gt;Test mode is not production. It does not enforce SCA. It does not check real BIN ranges. It does not apply real fraud detection rules. It does not connect to actual card networks. It is a simulation, and like all simulations, it has blind spots.&lt;/p&gt;

&lt;p&gt;The gap between test mode and production is where payment bugs live. You can narrow that gap by using the right test cards, testing authentication flows explicitly, and verifying webhook handling under failure conditions. But you cannot eliminate it entirely without production monitoring.&lt;/p&gt;

&lt;p&gt;We always recommend that clients set up real-time alerting on payment failure rates. A 2% failure rate on day one that creeps to 8% by day thirty means something changed at the processor or issuer level, and no amount of pre-launch testing catches that.&lt;/p&gt;

&lt;h3&gt;
  
  
  What we have learned from testing payments across clients
&lt;/h3&gt;

&lt;p&gt;After testing payment integrations for fintech and e-commerce clients at &lt;a href="https://betterqa.co" rel="noopener noreferrer"&gt;BetterQA&lt;/a&gt;, a few things stand out:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Card type validation belongs at the processor level, not your frontend.&lt;/strong&gt; Let Stripe or Adyen validate the PAN. Your job is to not block valid cards before they reach the processor.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;3D Secure is not optional in Europe.&lt;/strong&gt; If you sell to EU customers and your integration does not handle 3DS, you will lose transactions. Not might. Will.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Test the sad paths harder than the happy paths.&lt;/strong&gt; A successful payment needs to work. A failed payment needs to communicate clearly. Most teams spend 90% of testing time on success and 10% on failure. We flip that ratio.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Webhooks are your safety net.&lt;/strong&gt; Client-side confirmation is unreliable. Browsers crash, users close tabs, networks drop. Your backend must handle payment confirmation through webhooks independently of what happens in the browser.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Currency handling is a category of bugs, not a single check.&lt;/strong&gt; Zero-decimal currencies, three-decimal currencies, conversion rounding, minimum amounts: each one is a distinct failure mode.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Payment bugs are expensive, embarrassing, and preventable. The card types and scenarios in this article are the ones we see break most often. Test them before your users find them for you.&lt;/p&gt;

&lt;p&gt;More on how we approach QA for complex integrations: &lt;a href="https://betterqa.co/blog" rel="noopener noreferrer"&gt;betterqa.co/blog&lt;/a&gt;&lt;/p&gt;

</description>
      <category>testing</category>
      <category>webdev</category>
    </item>
    <item>
      <title>The automation mistakes we keep fixing on inherited test suites</title>
      <dc:creator>Tudor Brad</dc:creator>
      <pubDate>Thu, 09 Apr 2026 08:29:18 +0000</pubDate>
      <link>https://dev.to/tudorsss-betterqa/the-automation-mistakes-we-keep-fixing-on-inherited-test-suites-54dh</link>
      <guid>https://dev.to/tudorsss-betterqa/the-automation-mistakes-we-keep-fixing-on-inherited-test-suites-54dh</guid>
      <description>&lt;p&gt;I have inherited a lot of test suites. Some were built by contractors. Some were built by developers who drew the short straw. A few were started by QA engineers who left the company before anyone else learned how the framework worked.&lt;/p&gt;

&lt;p&gt;They all break in the same ways.&lt;/p&gt;

&lt;p&gt;At BetterQA, automation suite maintenance is a significant chunk of our work. We build suites from scratch, yes, but we also take over existing ones. And after years of doing this across dozens of clients and tech stacks, I can tell you the failure modes are remarkably consistent.&lt;/p&gt;

&lt;p&gt;Here are the mistakes I keep seeing, what they actually cost, and how we fix them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hardcoded waits everywhere
&lt;/h3&gt;

&lt;p&gt;This is the single most common problem. Open any inherited suite and you will find &lt;code&gt;sleep(5000)&lt;/code&gt; or &lt;code&gt;cy.wait(5000)&lt;/code&gt; or &lt;code&gt;time.sleep(5)&lt;/code&gt; scattered through the code like confetti.&lt;/p&gt;

&lt;p&gt;I understand why it happens. A test is flaky. The page takes a moment to load. Someone adds a wait, the test passes, the PR gets merged. Problem solved, right?&lt;/p&gt;

&lt;p&gt;No. Problem deferred.&lt;/p&gt;

&lt;p&gt;Here is what hardcoded waits actually cost you:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;They make your suite slow.&lt;/strong&gt; A 5-second wait runs for 5 seconds whether the element appeared in 200 milliseconds or 4.9 seconds. Multiply that across 300 tests and you have added 25 minutes of pure wasted time to every CI run. That is 25 minutes your developers sit waiting for a green check before they can merge.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;They mask real problems.&lt;/strong&gt; If your app genuinely takes 5 seconds to render a button, that is a performance bug. A hardcoded wait hides that bug. An explicit wait with a reasonable timeout surfaces it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;They are still flaky.&lt;/strong&gt; The app loads in 5 seconds on your machine. On the CI runner with limited resources, it takes 7 seconds. Now the test fails again and someone bumps the wait to 10.&lt;/p&gt;

&lt;p&gt;The fix is straightforward but requires discipline. Replace every hardcoded wait with an explicit condition: wait for the element to be visible, wait for the network request to complete, wait for the loading spinner to disappear. Playwright and Cypress both have built-in mechanisms for this. Use them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// This is the problem&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;waitForTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;click&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;#submit&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// This is the fix&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;waitForSelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;#submit&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;visible&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;click&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;#submit&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When we take over a suite, the first thing we do is search for &lt;code&gt;sleep&lt;/code&gt;, &lt;code&gt;wait&lt;/code&gt;, and &lt;code&gt;timeout&lt;/code&gt; calls. Replacing those alone typically cuts suite runtime by 30-40%.&lt;/p&gt;

&lt;h3&gt;
  
  
  No page object pattern (or an abandoned one)
&lt;/h3&gt;

&lt;p&gt;The second most common problem is raw selectors duplicated across dozens of test files. The login page selector &lt;code&gt;#email-input&lt;/code&gt; appears in 40 different tests. The dashboard navigation selector &lt;code&gt;.nav-item.active&lt;/code&gt; shows up in 60.&lt;/p&gt;

&lt;p&gt;Then the frontend team renames a CSS class and 60 tests break simultaneously.&lt;/p&gt;

&lt;p&gt;The page object pattern exists specifically to solve this. You define your selectors in one place, your tests reference the page object, and when the UI changes you update one file instead of 60.&lt;/p&gt;

&lt;p&gt;What I see more often than no page objects at all is an abandoned page object pattern. Someone started it, created page objects for the login page and maybe the dashboard, and then the team got busy and started writing selectors inline again. Now you have a codebase with two patterns, and you have to check both places when something breaks.&lt;/p&gt;

&lt;p&gt;If you are going to use page objects, commit to them. Every new test file should use them. If you are reviewing a PR that introduces a raw selector for a page that already has a page object, send it back.&lt;/p&gt;

&lt;p&gt;We have also started using &lt;a href="https://chromewebstore.google.com/detail/nicpbhgpaomjpfcakgdkklnkionajcje" rel="noopener noreferrer"&gt;Flows&lt;/a&gt;, our Chrome extension that records browser interactions and generates self-healing test selectors. The self-healing part matters because it addresses the brittle selector problem directly: if your selector breaks because someone changed a class name, Flows detects the shift and adapts. That removes the most painful part of page object maintenance, which is keeping selectors current when the frontend moves fast.&lt;/p&gt;

&lt;h3&gt;
  
  
  Testing implementation details instead of behavior
&lt;/h3&gt;

&lt;p&gt;This one is subtle and I still catch experienced engineers doing it.&lt;/p&gt;

&lt;p&gt;A test that checks &lt;code&gt;expect(component.state.isLoading).toBe(false)&lt;/code&gt; is testing implementation. A test that checks &lt;code&gt;expect(screen.getByText('Dashboard')).toBeVisible()&lt;/code&gt; is testing behavior.&lt;/p&gt;

&lt;p&gt;Why does the distinction matter? Because implementation changes constantly. Someone refactors the loading state from a boolean to an enum. Someone moves from local state to a global store. Someone replaces the custom spinner with a library component. Every one of those changes breaks the implementation test while the actual user-facing behavior stays identical.&lt;/p&gt;

&lt;p&gt;Tests should answer one question: does the user see what they expect to see?&lt;/p&gt;

&lt;p&gt;When I audit a suite, I look for tests that reference internal state, internal method names, or specific DOM structure beyond what the user actually sees. Those tests are maintenance liabilities. They will break during refactors that change zero user-facing behavior, and every false failure erodes the team's trust in the suite.&lt;/p&gt;

&lt;p&gt;Write your test assertions the way a user would describe the expected result. "I click submit and I see a confirmation message." Not "I click submit and the Redux store's &lt;code&gt;formSubmission.status&lt;/code&gt; field equals &lt;code&gt;SUCCESS&lt;/code&gt;."&lt;/p&gt;

&lt;h3&gt;
  
  
  No cleanup between tests
&lt;/h3&gt;

&lt;p&gt;Tests should be independent. Each test should set up its own preconditions and clean up after itself. This is testing 101 and it is violated constantly.&lt;/p&gt;

&lt;p&gt;The symptom is test order dependence. Test A creates a user, Test B assumes that user exists, Test C deletes the user. Run them in order and everything passes. Run Test B alone and it fails. Run them in parallel and you get race conditions.&lt;/p&gt;

&lt;p&gt;I once inherited a suite where the entire test run depended on the first test creating a specific database seed. If that first test failed for any reason, every subsequent test failed too. The team had been living with this for a year, re-running the suite whenever the first test had a hiccup, and treating it as normal.&lt;/p&gt;

&lt;p&gt;That is not normal. That is a test suite that can only give you useful signal when conditions are perfect, which in CI environments is roughly never.&lt;/p&gt;

&lt;p&gt;The fix involves two things:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before-each hooks for setup.&lt;/strong&gt; Every test (or test group) should create the data it needs. If test B needs a user, test B creates that user in a &lt;code&gt;beforeEach&lt;/code&gt; block.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After-each hooks for teardown.&lt;/strong&gt; Delete what you created. Reset the state. Log out the session. If you are using an API to create test data (which you should be for speed), use that same API to clean up.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Each test owns its own data&lt;/span&gt;
&lt;span class="nf"&gt;beforeEach&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;testUser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createUser&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`test-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;@example.com`&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nf"&gt;afterEach&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;testUser&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;deleteUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;testUser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This adds a few seconds of setup per test but it eliminates an entire category of flakiness. The tradeoff is worth it every time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Running everything sequentially when tests could run in parallel
&lt;/h3&gt;

&lt;p&gt;Most test suites I inherit run every test in sequence. 400 tests, one after another, 45 minutes total. The team complains about slow CI. Nobody has tried parallelization.&lt;/p&gt;

&lt;p&gt;If your tests are independent (and after fixing the cleanup problem above, they should be), there is no reason they cannot run in parallel. Playwright supports parallel execution out of the box. Cypress has parallelization through their dashboard or through CI matrix strategies. Even pytest can parallelize with pytest-xdist.&lt;/p&gt;

&lt;p&gt;The objections I hear are usually:&lt;/p&gt;

&lt;p&gt;"Our tests share a database." Then give each parallel worker its own database, or use unique prefixes per worker so the data does not collide.&lt;/p&gt;

&lt;p&gt;"Some tests are slow and some are fast, so parallelization does not help much." Use test sharding based on historical run times, not naive splitting by file.&lt;/p&gt;

&lt;p&gt;"We tried it and got flaky results." That means you have test isolation problems (see the cleanup section above). Fixing isolation fixes parallelization.&lt;/p&gt;

&lt;p&gt;On a recent client project we took a suite from 52 minutes sequential to 11 minutes across 6 parallel workers. Same tests, same CI machine. The only changes were fixing test isolation and enabling Playwright's built-in parallelism.&lt;/p&gt;

&lt;h3&gt;
  
  
  The real cost of bad automation
&lt;/h3&gt;

&lt;p&gt;A bad test suite is worse than no test suite.&lt;/p&gt;

&lt;p&gt;That sounds extreme, but I mean it. A suite full of hardcoded waits, brittle selectors, and order-dependent tests produces two outcomes, both harmful:&lt;/p&gt;

&lt;p&gt;First, it creates false failures. Tests break for reasons unrelated to actual bugs. Developers learn to ignore the failures, re-run the suite, and merge anyway when it passes on the second try. At that point the suite is not catching bugs. It is a random gate that sometimes blocks merges for no reason.&lt;/p&gt;

&lt;p&gt;Second, it creates false confidence. Tests pass, so the team assumes the feature works. But the tests were checking implementation details that happen to still match, not actual user behavior that might have regressed. Bugs reach production despite a green test suite, and leadership starts questioning whether automation was worth the investment.&lt;/p&gt;

&lt;p&gt;The fix is not to abandon automation. The fix is to treat your test suite as production code. It needs code review. It needs refactoring. It needs maintenance. It needs someone who knows what they are doing.&lt;/p&gt;

&lt;h3&gt;
  
  
  What a healthy suite looks like
&lt;/h3&gt;

&lt;p&gt;After we clean up an inherited suite, the result usually has these properties:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Zero hardcoded waits.&lt;/strong&gt; Every wait is explicit and condition-based.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Page objects for every page.&lt;/strong&gt; Selectors live in one place.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Behavior-focused assertions.&lt;/strong&gt; Tests describe what the user sees, not how the code works internally.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full test isolation.&lt;/strong&gt; Any test can run alone or in any order.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallel execution.&lt;/strong&gt; Suite runtime is measured in minutes, not close to an hour.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-healing selectors where possible.&lt;/strong&gt; Tools like Flows reduce maintenance when the UI changes frequently.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of this is revolutionary. It is basic engineering discipline applied to test code. The problem is that test code rarely gets the same attention as application code, and the debt accumulates until someone inherits the suite and has to deal with it.&lt;/p&gt;

&lt;p&gt;If you are building a suite from scratch, build it right from the start. If you have inherited one that has these problems, fix them incrementally: start with the waits, then add page objects for the most-referenced pages, then fix isolation one test group at a time.&lt;/p&gt;

&lt;p&gt;And if you would rather hand that work to someone who has done it dozens of times before, that is literally what we do.&lt;/p&gt;

&lt;p&gt;More on automation, testing strategy, and QA engineering at &lt;a href="https://betterqa.co/blog" rel="noopener noreferrer"&gt;betterqa.co/blog&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>testing</category>
      <category>automation</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Shift-left testing sounds great until you try to get invited to the meeting</title>
      <dc:creator>Tudor Brad</dc:creator>
      <pubDate>Thu, 09 Apr 2026 08:29:13 +0000</pubDate>
      <link>https://dev.to/tudorsss-betterqa/shift-left-testing-sounds-great-until-you-try-to-get-invited-to-the-meeting-308f</link>
      <guid>https://dev.to/tudorsss-betterqa/shift-left-testing-sounds-great-until-you-try-to-get-invited-to-the-meeting-308f</guid>
      <description>&lt;p&gt;I have never met a single person in software who disagrees with shift-left testing in theory. Earlier testing catches cheaper bugs. The data is clear. The logic is obvious.&lt;/p&gt;

&lt;p&gt;And yet.&lt;/p&gt;

&lt;p&gt;Try walking into a design review as a QA engineer. Try asking a product manager if you can sit in on sprint planning. Try suggesting that testing should start before a single line of code exists.&lt;/p&gt;

&lt;p&gt;You will get polite resistance. You will get scheduling conflicts that are not really conflicts. You will get "we'll loop you in later" emails that never arrive.&lt;/p&gt;

&lt;p&gt;I have been running QA teams for years, and the hardest part of shift-left has never been the testing. It has been the politics.&lt;/p&gt;

&lt;h3&gt;
  
  
  The math that everyone already knows
&lt;/h3&gt;

&lt;p&gt;IBM published the numbers decades ago, and they have been validated repeatedly since. A bug found during requirements costs roughly $100 to fix. The same bug found in production costs $10,000 or more. That is a 100x multiplier.&lt;/p&gt;

&lt;p&gt;The Systems Sciences Institute at IBM put specific ranges on this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Requirements phase&lt;/strong&gt;: $100 per defect&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Design phase&lt;/strong&gt;: $300-600 per defect&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implementation&lt;/strong&gt;: $1,000-2,000 per defect&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;System testing&lt;/strong&gt;: $3,000-5,000 per defect&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production&lt;/strong&gt;: $10,000+ per defect&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;NIST backed this up with their own study estimating that software bugs cost the US economy $59.5 billion annually, with more than half of that cost attributable to bugs that could have been caught earlier.&lt;/p&gt;

&lt;p&gt;These are not controversial numbers. Every engineering leader has seen some version of this chart. It shows up in conference talks, blog posts, and onboarding decks at half the tech companies on the planet.&lt;/p&gt;

&lt;p&gt;So why does testing still start late?&lt;/p&gt;

&lt;h3&gt;
  
  
  The resistance nobody talks about
&lt;/h3&gt;

&lt;p&gt;Here is what actually happens when you try to shift left.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Developers feel watched.&lt;/strong&gt; When a tester shows up to a design meeting, some developers interpret it as distrust. "Why do we need QA here? We haven't even written anything yet." The subtext is: you are here to find problems with my thinking, and I did not sign up for that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Product managers feel slowed down.&lt;/strong&gt; Sprint planning already takes too long. Adding QA concerns means discussing edge cases, error states, and unhappy paths before anyone has committed to a direction. PMs want to move fast and refine later. QA wants to think through failure modes before the work begins. Those two instincts collide.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Testers feel unwelcome.&lt;/strong&gt; After a few rounds of being the person who raises problems in meetings full of people who want solutions, many QA engineers stop pushing. They wait for the handoff. They test what they are given. The shift-left conversation dies quietly.&lt;/p&gt;

&lt;p&gt;I have watched this pattern play out at dozens of organizations. The people are not wrong for feeling what they feel. The resistance is human, and pretending it does not exist is why most shift-left initiatives fail.&lt;/p&gt;

&lt;h3&gt;
  
  
  Two clients, two outcomes
&lt;/h3&gt;

&lt;p&gt;I want to share two real situations because the contrast is stark.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Client A&lt;/strong&gt; brought us in during the requirements phase. We sat in on product discussions. We reviewed wireframes and user stories before development started. We wrote test scenarios alongside acceptance criteria.&lt;/p&gt;

&lt;p&gt;The result: their production bug rate dropped by roughly 50% within three months. Not because we were catching more bugs in testing, but because our questions during requirements eliminated entire categories of bugs before they were ever coded. Things like "what happens if the user has two active sessions?" or "does this flow work for users who skipped onboarding?" would surface in design, and the team would address them in the spec.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Client B&lt;/strong&gt; brought us in at staging. We got builds after development was done, filed bugs, and watched them get fixed in the next sprint. Classic.&lt;/p&gt;

&lt;p&gt;Their bug count in production stayed roughly flat quarter over quarter. They kept shipping the same types of bugs: missing validation on edge cases, broken flows for uncommon user paths, accessibility gaps nobody thought about. The bugs were not hard to find. They were predictable. They were the kind of bugs that disappear when someone asks the right questions during design.&lt;/p&gt;

&lt;p&gt;Same QA team. Same processes. Same tools. The only difference was when we entered the picture.&lt;/p&gt;

&lt;h3&gt;
  
  
  What shift-left actually looks like in practice
&lt;/h3&gt;

&lt;p&gt;Shift-left is not about running unit tests earlier, though that helps. It is about involving testing thinking earlier. There is a difference.&lt;/p&gt;

&lt;p&gt;Here is what works:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test case design during requirements.&lt;/strong&gt; Before a story moves to development, write the test scenarios. Not automated scripts. Just plain-language descriptions of what you are going to verify. This forces everyone to agree on expected behavior before anyone starts building.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;QA review of acceptance criteria.&lt;/strong&gt; Testers are better at finding ambiguity in specs than developers are, because testers think about what could go wrong. A developer reads "user can update their profile" and thinks about the happy path. A tester reads the same story and asks: what fields are required? What happens with special characters? Can they update while another session is active? Is there a character limit?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bug categorization and traceability.&lt;/strong&gt; Track where bugs originate. If 60% of your production bugs trace back to unclear requirements, that is your argument for QA in the requirements phase. Hard numbers beat theoretical frameworks every time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pair sessions between QA and dev.&lt;/strong&gt; Not formal meetings. Just a developer and a tester spending 20 minutes talking through a feature before implementation. These conversations catch misunderstandings that would otherwise become bugs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Getting past the resistance
&lt;/h3&gt;

&lt;p&gt;Here is the honest advice, based on what I have seen work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Start small.&lt;/strong&gt; Do not try to get QA invited to every meeting. Pick one feature or one team. Demonstrate value with a contained experiment. When the team sees fewer bugs coming back from that feature, they will ask for more QA involvement on their own.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lead with questions, not criticism.&lt;/strong&gt; The fastest way to get uninvited from design meetings is to be the person who says "that won't work." Instead, ask questions. "How should this behave when the API is slow?" is more welcome than "you haven't considered the timeout case." Same concern, different framing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Show the cost data for your own project.&lt;/strong&gt; The IBM numbers are nice but generic. What actually moves people is your project's own data. Pull the bug reports from the last quarter. Categorize them by root cause. Calculate the time spent fixing production bugs versus the time it would have taken to catch them in requirements. That is a number your PM will care about.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Accept that shift-left is a spectrum.&lt;/strong&gt; You do not need QA at every design meeting to get value. Even moving from "QA starts at staging" to "QA reviews stories before sprint commitment" is a significant shift. Take the win.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where automation fits
&lt;/h3&gt;

&lt;p&gt;Test-driven development, automated regression suites, CI/CD pipelines with quality gates: all of these are shift-left tools. They move verification earlier by making it cheaper to run tests frequently.&lt;/p&gt;

&lt;p&gt;But automation is the easy part. You can set up a CI pipeline in an afternoon. Getting a product manager to add 15 minutes of QA review to their sprint planning ceremony takes months of relationship building.&lt;/p&gt;

&lt;p&gt;The teams that get the most out of shift-left are the ones that combine both: automated checks that run early and often, plus human testers who participate in the thinking that happens before code exists.&lt;/p&gt;

&lt;h3&gt;
  
  
  The uncomfortable truth
&lt;/h3&gt;

&lt;p&gt;Shift-left testing works. The data is overwhelming. The case studies are consistent. Organizations that test earlier ship better software with fewer production incidents.&lt;/p&gt;

&lt;p&gt;But it requires something that no framework or tool can provide: it requires developers and product managers to voluntarily share their planning process with people whose job is to find problems. That is an act of trust, and trust takes time.&lt;/p&gt;

&lt;p&gt;If you are a QA leader trying to push shift-left, be patient. Build relationships before building processes. Demonstrate value in small doses. And accept that the human side of this change is harder than the technical side.&lt;/p&gt;

&lt;p&gt;The bugs do not care about your feelings. They will be cheaper to fix early whether your team is ready to hear that or not. Your job is to make them ready.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;We write about testing practices, QA strategy, and the realities of running independent QA teams at &lt;a href="https://betterqa.co/blog" rel="noopener noreferrer"&gt;betterqa.co/blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>testing</category>
      <category>automation</category>
      <category>a11y</category>
      <category>devops</category>
    </item>
    <item>
      <title>We went 100% automation on a client project. Here's what broke.</title>
      <dc:creator>Tudor Brad</dc:creator>
      <pubDate>Thu, 09 Apr 2026 08:23:34 +0000</pubDate>
      <link>https://dev.to/tudorsss-betterqa/we-went-100-automation-on-a-client-project-heres-what-broke-2ge0</link>
      <guid>https://dev.to/tudorsss-betterqa/we-went-100-automation-on-a-client-project-heres-what-broke-2ge0</guid>
      <description>&lt;p&gt;Last year we had a client come to us after they'd fired their entire manual QA team. They'd invested six months into a Cypress suite with 400+ tests, hired two automation engineers, and felt confident they had testing covered.&lt;/p&gt;

&lt;p&gt;Three weeks after the manual testers left, their support tickets tripled.&lt;/p&gt;

&lt;p&gt;The automated suite was passing. Every single run: green. And their users were reporting bugs that no script had ever thought to check for.&lt;/p&gt;

&lt;p&gt;I've seen this play out at &lt;a href="https://betterqa.co" rel="noopener noreferrer"&gt;BetterQA&lt;/a&gt; more times than I can count. A team gets excited about automation, treats it as a silver bullet, and then learns the hard way that a green CI pipeline is not the same thing as a working product.&lt;/p&gt;

&lt;p&gt;This is the story of what actually happens when you go all-in on automation and abandon manual testing entirely.&lt;/p&gt;

&lt;h3&gt;
  
  
  The green suite problem
&lt;/h3&gt;

&lt;p&gt;Here's the thing nobody tells you about a 100% automated test suite: it only checks for things you already thought of.&lt;/p&gt;

&lt;p&gt;Every automated test starts as a human decision. Someone sat down, considered a scenario, and wrote a script to verify it. That script will faithfully run that same check forever. It will never wonder "what happens if I click this button twice really fast?" or "does this flow still make sense after the last redesign?"&lt;/p&gt;

&lt;p&gt;On this particular project, the client's suite covered login flows, CRUD operations, payment processing, and a handful of API contract tests. Solid coverage on paper. But nobody was testing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What happens when a user fills out a form, leaves for 20 minutes, and comes back&lt;/li&gt;
&lt;li&gt;Whether the new dashboard layout actually makes sense to someone seeing it for the first time&lt;/li&gt;
&lt;li&gt;How the mobile experience feels on a slow 3G connection&lt;/li&gt;
&lt;li&gt;Whether the error messages help users recover or just confuse them&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are the kinds of things a manual tester catches in the first five minutes of an exploratory session. No script in the world is looking for them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Automation is a regression tool, not a testing strategy
&lt;/h3&gt;

&lt;p&gt;I need to be blunt about this because the industry has muddied the water: automation testing and software testing are not the same thing.&lt;/p&gt;

&lt;p&gt;Automation is phenomenal at regression. You fixed a bug? Write a test so it never comes back. You have a critical payment flow? Automate it so every deploy verifies it still works. You need to run the same checks across 12 browser/device combinations? Automation saves you days of repetitive work.&lt;/p&gt;

&lt;p&gt;But regression is only one slice of testing. Exploratory testing, usability evaluation, edge case discovery, accessibility review, "does this feature actually solve the user's problem" testing: none of that can be scripted. Not because the tools aren't good enough, but because the value of those activities comes from human judgment and creativity.&lt;/p&gt;

&lt;p&gt;When our client killed their manual team, they didn't just lose testers. They lost the people who understood how real users interact with the product.&lt;/p&gt;

&lt;h3&gt;
  
  
  What broke (specifically)
&lt;/h3&gt;

&lt;p&gt;Let me walk through the actual failures we saw on this project, because abstract arguments are easy to dismiss.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Usability regressions went unnoticed for weeks.&lt;/strong&gt; The dev team shipped a redesigned settings page. The automation suite verified that every button and input worked. What it couldn't tell them was that the new layout was confusing: users couldn't find the save button because it was below the fold on most screens. Support tickets piled up. A manual tester would have caught this in one session.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Edge cases multiplied.&lt;/strong&gt; The suite tested the happy path and a few known error states. But real users do unpredictable things. They paste formatted text from Word documents into plain text fields. They open the app in two tabs and edit the same record simultaneously. They use browser autofill in ways that break client-side validation. The automation engineers couldn't write scripts fast enough to cover the edge cases that a curious manual tester would stumble into organically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;False confidence from false negatives.&lt;/strong&gt; The suite had several tests that were passing but not actually verifying what they claimed to verify. A selector had drifted after a UI update, so the test was clicking a different element and asserting on stale data. Green check mark, zero value. When we audited the suite, about 8% of the tests were essentially testing nothing. A manual tester running the same scenarios would have noticed immediately that the behavior was wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dynamic content broke silently.&lt;/strong&gt; The app served personalized dashboards with data-driven layouts. The automation suite used hardcoded selectors and fixed test data. Every time the personalization engine changed what was displayed, tests either broke (noisy failures that got ignored) or passed incorrectly (silent failures that hid real issues). The team spent more time maintaining flaky tests than they saved by automating.&lt;/p&gt;

&lt;h3&gt;
  
  
  The maintenance tax nobody budgets for
&lt;/h3&gt;

&lt;p&gt;This is the part that surprises teams the most. Automation isn't "write it once and forget it." It's a living codebase that needs maintenance, refactoring, and debugging just like your production code.&lt;/p&gt;

&lt;p&gt;On this project, the two automation engineers were spending roughly 60% of their time maintaining existing tests and only 40% writing new coverage. Every UI change, every feature flag toggle, every API response format update meant updating test scripts.&lt;/p&gt;

&lt;p&gt;Compare that to a manual tester who can adapt on the fly. The button moved? They find it. The API response changed shape? They notice the UI looks different and investigate. The feature flag is on? They test the new behavior. No script updates required.&lt;/p&gt;

&lt;p&gt;I'm not saying maintenance is a reason to avoid automation. I'm saying that if you don't budget for it, your "cost savings" from firing the manual team evaporate fast.&lt;/p&gt;

&lt;h3&gt;
  
  
  Complex scenarios resist automation
&lt;/h3&gt;

&lt;p&gt;Some testing scenarios are genuinely hard to automate well. Multi-step workflows that span multiple systems, tests that depend on timing or environmental conditions, scenarios that require judgment calls about whether the output "looks right."&lt;/p&gt;

&lt;p&gt;We had one case where the client needed to test a document generation feature. The automation could verify that a PDF was produced and that it contained certain text strings. But it couldn't tell whether the formatting was correct, whether the layout was readable, or whether the generated content actually made sense in context. A human looks at the PDF and immediately knows if something is off.&lt;/p&gt;

&lt;p&gt;This isn't a tooling limitation that better frameworks will solve. It's a fundamental constraint: some quality attributes require human perception.&lt;/p&gt;

&lt;h3&gt;
  
  
  What we actually recommend
&lt;/h3&gt;

&lt;p&gt;When we onboarded this client, we didn't tell them to throw away their automation suite. That would have been equally wrong in the other direction.&lt;/p&gt;

&lt;p&gt;We helped them build a balanced approach:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Automate regression and smoke tests.&lt;/strong&gt; The things that need to pass on every deploy, the critical paths that must always work, the repetitive checks across environments and devices. This is where automation earns its keep.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep manual testers for exploratory work.&lt;/strong&gt; Dedicate time for testers to explore new features without a script. Let them break things creatively. Give them the freedom to follow their instincts when something feels off. This is where you find the bugs that matter most to users.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use manual testing for usability evaluation.&lt;/strong&gt; Before any major release, have a real human go through the key flows and ask: does this make sense? Is this intuitive? Would I be frustrated if I were a customer? No automated tool can answer these questions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rotate who does exploratory testing.&lt;/strong&gt; Don't limit it to QA. Developers, product managers, designers: fresh eyes catch things that familiar eyes skip. The person who built the feature is the worst person to evaluate whether it's intuitive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Review your automation suite quarterly.&lt;/strong&gt; Audit for false negatives, outdated selectors, tests that pass but don't verify anything meaningful. Prune ruthlessly. A smaller suite that actually catches bugs is worth more than a massive suite that gives you false confidence.&lt;/p&gt;

&lt;h3&gt;
  
  
  The AI question
&lt;/h3&gt;

&lt;p&gt;I'll address the elephant in the room because everyone asks. With AI-powered testing tools getting better every month, does this change the equation?&lt;/p&gt;

&lt;p&gt;Our founder Tudor Brad has a line I keep coming back to: "AI will replace development before it replaces QA."&lt;/p&gt;

&lt;p&gt;His reasoning is sound. AI can generate code, but the act of evaluating whether that code does what users actually need requires human judgment. And with AI accelerating development speed (features that used to take months now take hours), the volume of things that need testing is exploding. You don't need less QA in an AI-accelerated world. You need more.&lt;/p&gt;

&lt;p&gt;AI tools are great at generating test cases, identifying patterns in bug reports, and even doing some basic visual regression checking. But the core question of "does this product work for real humans in real situations" still requires a human in the loop.&lt;/p&gt;

&lt;h3&gt;
  
  
  The real lesson
&lt;/h3&gt;

&lt;p&gt;The client I mentioned at the start? After we helped them rebuild their testing approach with a mix of automation and manual testing, their support ticket volume dropped back to pre-automation-only levels within six weeks. They kept their Cypress suite. They also brought back two manual testers.&lt;/p&gt;

&lt;p&gt;The lesson isn't that automation is bad. The lesson is that automation is a tool, not a strategy. And the teams that treat it as the entire strategy are the ones who end up with a green pipeline and angry users.&lt;/p&gt;

&lt;p&gt;If your test suite is passing and your users are still finding bugs, the suite isn't the problem. What's missing is the human who would have found those bugs first.&lt;/p&gt;

&lt;p&gt;More on testing strategy and QA methodology on the &lt;a href="https://betterqa.co/blog" rel="noopener noreferrer"&gt;BetterQA blog&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>testing</category>
      <category>ai</category>
      <category>automation</category>
      <category>a11y</category>
    </item>
    <item>
      <title>Manual testing isn't dying, but manual testers need to change</title>
      <dc:creator>Tudor Brad</dc:creator>
      <pubDate>Thu, 09 Apr 2026 08:23:30 +0000</pubDate>
      <link>https://dev.to/tudorsss-betterqa/manual-testing-isnt-dying-but-manual-testers-need-to-change-5289</link>
      <guid>https://dev.to/tudorsss-betterqa/manual-testing-isnt-dying-but-manual-testers-need-to-change-5289</guid>
      <description>&lt;p&gt;I run a QA company with 50-plus engineers spread across 24 countries. Roughly half of them do manual testing. Not because we're behind the times. Because that's what our clients need.&lt;/p&gt;

&lt;p&gt;Every conference talk, every LinkedIn influencer, every bootcamp curriculum pushes the same story: automate everything, manual testing is a relic, if you're clicking through a UI in 2024 you're wasting money. I've heard this for years. And every year, the demand for skilled manual testers at &lt;a href="https://betterqa.co" rel="noopener noreferrer"&gt;BetterQA&lt;/a&gt; grows.&lt;/p&gt;

&lt;p&gt;So let me say what I actually think. Manual testing isn't dying. But the version of manual testing that people imagine when they hear the phrase? That version probably should die.&lt;/p&gt;

&lt;h3&gt;
  
  
  The boring manual testing is already dead
&lt;/h3&gt;

&lt;p&gt;Let me be clear about what I'm not defending.&lt;/p&gt;

&lt;p&gt;If your manual testing process involves a tester opening a spreadsheet of 200 test cases, clicking through each one in sequence, writing "pass" or "fail" in a column, and repeating this before every release, then yes. Automate that. Automate it yesterday. That kind of work destroys morale, produces inconsistent results, and costs more per bug found than any reasonable automation framework.&lt;/p&gt;

&lt;p&gt;We automated repetitive regression testing years ago. We built &lt;a href="https://chromewebstore.google.com/detail/betterqa-flows/gpoacfandmbjlipmccjlnpfheiocbigl" rel="noopener noreferrer"&gt;Flows&lt;/a&gt;, a Chrome extension that records browser interactions and replays them as tests with self-healing selectors. The entire point was to free our manual testers from the mechanical parts of the job so they could spend time on the work that actually requires a human brain.&lt;/p&gt;

&lt;p&gt;When people say "manual testing is dying," they usually mean this repetitive, scripted, follow-the-checklist kind. And they're right. It should die. The problem is that they then leap to the conclusion that all manual testing should die, and that's where they're wrong.&lt;/p&gt;

&lt;h3&gt;
  
  
  What manual testers actually do now
&lt;/h3&gt;

&lt;p&gt;The testers on my team who do manual work aren't clicking through login forms all day. Here's what their week actually looks like.&lt;/p&gt;

&lt;p&gt;They spend time in exploratory testing sessions, deliberately trying to break things in ways nobody anticipated. They navigate the product the way a confused user would, not the way a specification document describes. They find bugs that no automation script would ever catch because no one thought to write a test for that scenario.&lt;/p&gt;

&lt;p&gt;They review designs and requirements before a single line of code gets written. This is the cheapest place to find defects. A bug caught in a requirements review costs almost nothing to fix. The same bug found in production costs 100 times more. That's not an exaggeration. It's a well-documented cost multiplier that's held up across decades of software engineering research.&lt;/p&gt;

&lt;p&gt;They do usability assessments. They sit with the product and ask questions like: would a real person understand this flow? Does this error message actually tell you what went wrong? Is the button where you'd expect it to be? Automation can tell you whether a button exists on the page. It cannot tell you whether the button makes sense.&lt;/p&gt;

&lt;p&gt;They run accessibility checks. Not just automated scans (those miss roughly 60-70% of real accessibility barriers), but actual screen reader walkthroughs, keyboard-only navigation, cognitive load evaluation. A WCAG compliance tool will tell you that a form label exists. A manual tester will tell you that the label says "Field 3" and means nothing to anyone.&lt;/p&gt;

&lt;p&gt;They probe for security issues. Not full penetration testing necessarily, but the kind of poking around that finds exposed data in API responses, broken authorization checks, session handling problems. With AI-generated code flooding into production, this kind of investigative work matters more than it did five years ago.&lt;/p&gt;

&lt;h3&gt;
  
  
  The "automate everything" pressure is real, and it's partially nonsense
&lt;/h3&gt;

&lt;p&gt;I get why engineering leaders push for full automation. The pitch is seductive. Write the tests once, run them forever, get fast feedback, reduce headcount. What's not to like?&lt;/p&gt;

&lt;p&gt;Here's what I've seen happen in practice.&lt;/p&gt;

&lt;p&gt;A client moves to 100% automation. Their Selenium or Playwright suite covers all the happy paths beautifully. CI runs green. Everyone feels confident. Then they ship a feature where the shopping cart total displays correctly but the font is 4px and grey on grey. A human would catch that in seconds. The automation suite doesn't check font sizes because nobody thought to add that assertion. A customer screenshots it, posts it on Twitter, and suddenly "fully automated QA" looks a lot less impressive.&lt;/p&gt;

&lt;p&gt;Another client automates their entire regression suite. Takes three months and costs a fortune. Then the product team redesigns the navigation. Forty percent of the automated tests break, not because of bugs but because the selectors changed. Now you have an automation maintenance backlog that's bigger than the original testing backlog. The team spends more time fixing tests than writing new ones.&lt;/p&gt;

&lt;p&gt;Automation is powerful, genuinely powerful, for specific categories of testing. Cross-browser compatibility. Regression on stable features. Performance benchmarks. Data-driven tests where you need to run the same flow with 500 different input combinations. For those things, automation is not just better than manual testing, it's the only sane option.&lt;/p&gt;

&lt;p&gt;But automation is terrible at answering "does this feel right?" It can't do creative exploration. It can't notice that the loading spinner is technically working but feels sluggish in a way that will irritate users. It can't look at a form and realize that the field order doesn't match the mental model a healthcare administrator has when processing patient intake.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's actually changing for manual testers
&lt;/h3&gt;

&lt;p&gt;Here's the honest part that the "manual testing is dead" crowd gets right, even if they get the conclusion wrong. The job description is changing fast.&lt;/p&gt;

&lt;p&gt;Five years ago, a junior manual tester could get by with basic test case execution skills. Open the app, follow steps, report results. That's not enough anymore.&lt;/p&gt;

&lt;p&gt;The manual testers who are thriving on our team have skills that overlap with product management, security analysis, and UX research. They understand API calls well enough to check what's happening under the hood when the UI looks fine. They use browser DevTools to inspect network requests, check response payloads, verify that sensitive data isn't leaking in places it shouldn't be. They understand enough about accessibility standards to do meaningful evaluations, not just run an axe scan and forward the results.&lt;/p&gt;

&lt;p&gt;They're also comfortable working alongside automation. On most of our client projects, the same team handles both. A manual tester explores a new feature, finds the edge cases, documents them, and then works with the automation engineer to decide which paths are worth scripting for regression and which are one-time exploratory findings. That collaboration is where the real quality comes from. Not from one discipline replacing the other.&lt;/p&gt;

&lt;p&gt;Our founder Tudor Brad has a line he uses a lot: "AI will replace development before it replaces QA." It sounds provocative, and he means it to be. But the core point is serious. AI tools can generate code. They can even generate test scripts. What they cannot do is understand whether a product feels right to use, whether a workflow makes sense for the specific humans who will use it, or whether a security boundary that technically exists is actually robust enough. That requires judgment, creativity, and domain knowledge that nobody has automated yet.&lt;/p&gt;

&lt;h3&gt;
  
  
  The vibe coding problem
&lt;/h3&gt;

&lt;p&gt;This part is new, and it matters.&lt;/p&gt;

&lt;p&gt;We're seeing more client projects where significant chunks of the codebase were generated by AI tools. GitHub Copilot, Claude, ChatGPT, whatever the flavour of the month is. The code works, mostly. It passes the unit tests that the AI also generated. And it ships with subtle bugs that only surface when a real person uses the product in ways the AI didn't anticipate.&lt;/p&gt;

&lt;p&gt;I've seen AI-generated form validation that checked email format but not length, allowing a 10,000-character email to crash the backend. I've seen AI-generated pagination that worked perfectly for pages 1 through 10 but returned duplicate results on page 11. These aren't exotic edge cases. They're the kind of thing a manual tester finds in their first hour with the feature because they naturally try inputs that a generated test suite doesn't consider.&lt;/p&gt;

&lt;p&gt;As AI-assisted development accelerates the speed of feature delivery, the demand for people who can thoughtfully evaluate those features goes up, not down. Features that took three months to build now take three hours. That same speed produces more surface area for defects. You need testing that can match that pace, and skilled exploratory testers are faster at covering new ground than any test automation framework.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I tell testers who are worried about their careers
&lt;/h3&gt;

&lt;p&gt;If you're a manual tester and you're nervous about automation replacing you, I understand the anxiety. But I think the threat is misidentified.&lt;/p&gt;

&lt;p&gt;The thing that will make you irrelevant is not automation. It's refusing to evolve what "manual testing" means for you personally.&lt;/p&gt;

&lt;p&gt;Learn how to use browser DevTools. Understand enough about APIs to read a response payload. Get comfortable with accessibility testing beyond just running a scanner. Develop a specialty: security probing, or usability evaluation, or data integrity analysis. Understand CI/CD pipelines well enough to know when and where your testing fits in the release process.&lt;/p&gt;

&lt;p&gt;You don't need to become a programmer. But you need to be more than someone who follows a test script. The testers on my team who are most in demand with clients are the ones who can sit in a sprint planning meeting, hear a feature described, and immediately start asking questions that expose gaps in the requirements. That's not a skill automation replaces. It's a skill that makes automation more effective because it ensures the right things get automated in the first place.&lt;/p&gt;

&lt;h3&gt;
  
  
  The honest answer
&lt;/h3&gt;

&lt;p&gt;Manual testing isn't dying. What's dying is the job description that says "execute pre-written test cases and record results." That work is being absorbed by automation, and it should be.&lt;/p&gt;

&lt;p&gt;What's growing is the need for people who can think critically about software quality, who can explore products with creativity and suspicion, who can translate technical findings into business risk, and who can evaluate whether something that technically works actually works well for the people who'll use it.&lt;/p&gt;

&lt;p&gt;The boring repetitive stuff? Automate it and don't look back.&lt;/p&gt;

&lt;p&gt;The creative investigative work? That's more valuable now than it's ever been. And I don't see that changing anytime soon.&lt;/p&gt;

&lt;p&gt;If you're interested in how we approach testing at BetterQA, or you want to see more of our thinking on QA in the AI era, check out &lt;a href="https://betterqa.co/blog" rel="noopener noreferrer"&gt;betterqa.co/blog&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>testing</category>
      <category>ai</category>
      <category>automation</category>
      <category>security</category>
    </item>
    <item>
      <title>The test case mistakes we see on every new client project</title>
      <dc:creator>Tudor Brad</dc:creator>
      <pubDate>Thu, 09 Apr 2026 08:16:21 +0000</pubDate>
      <link>https://dev.to/tudorsss-betterqa/the-test-case-mistakes-we-see-on-every-new-client-project-311m</link>
      <guid>https://dev.to/tudorsss-betterqa/the-test-case-mistakes-we-see-on-every-new-client-project-311m</guid>
      <description>&lt;p&gt;I lead QA onboarding at BetterQA. When a new client signs on, one of the first things I do is audit their existing test suite. I open it up, scroll through a few hundred test cases, and within about twenty minutes I can tell you exactly how much of it is useful.&lt;/p&gt;

&lt;p&gt;Usually? About half.&lt;/p&gt;

&lt;p&gt;That might sound harsh, but after doing this across dozens of client projects with a team of 50+ engineers, the patterns are so consistent it's almost boring. The same mistakes, the same dead weight, the same "we wrote these two years ago and nobody's touched them since."&lt;/p&gt;

&lt;p&gt;The worst version of this is inheriting a 2,000-test suite where the team proudly tells you their pass rate is 97%. Then you look closer and realize 600 of those tests have no real assertions. Another 300 are duplicates with slightly different names. A hundred are flaky and get re-run until they pass. The 97% number is meaningless. It just makes everyone feel good while bugs keep shipping to production.&lt;/p&gt;

&lt;p&gt;Here's what I keep finding.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tests that test the framework, not the app
&lt;/h3&gt;

&lt;p&gt;This is the single most common problem, and it's the sneakiest one because the tests look legitimate. They run. They pass. They show up green in the CI pipeline. Everyone's happy.&lt;/p&gt;

&lt;p&gt;But the test isn't actually verifying that your application does something correctly. It's verifying that React renders a component. Or that a form element exists on the page. Or that clicking a button fires an event handler.&lt;/p&gt;

&lt;p&gt;I saw a suite last year where someone had written 40 tests for a checkout flow. Every single one was checking that UI elements rendered. Not one test verified that an order was actually created, that inventory was decremented, or that the payment was processed. The checkout could have been completely broken and all 40 tests would still pass.&lt;/p&gt;

&lt;p&gt;The fix is simple but requires discipline: every test needs to assert something about your business logic, not about whether your framework is doing its job. If you're testing that a button exists, that's a framework test. If you're testing that clicking the button creates an order with the correct line items, that's an application test.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tests with no meaningful assertions
&lt;/h3&gt;

&lt;p&gt;Related but distinct from the framework problem. These tests go through a whole flow, click things, fill out forms, navigate between pages, and then... nothing. No assertion at the end. Or a single assertion that checks something trivial, like the page title.&lt;/p&gt;

&lt;p&gt;I opened a Cypress suite for a client last quarter and found 15 tests that navigated to various pages and asserted &lt;code&gt;cy.url().should('include', '/dashboard')&lt;/code&gt;. That was it. The tests confirmed you could reach the dashboard. They said nothing about whether the dashboard was showing the right data, whether the charts loaded, whether the filters worked.&lt;/p&gt;

&lt;p&gt;The tester who wrote them probably had good intentions. They were probably under pressure to increase test coverage numbers. So they wrote tests that technically covered pages without actually verifying anything useful.&lt;/p&gt;

&lt;p&gt;If your test doesn't have an assertion that would fail when the feature breaks, it's not a test. It's a page visit.&lt;/p&gt;

&lt;h3&gt;
  
  
  Copy-pasted tests with wrong expected values
&lt;/h3&gt;

&lt;p&gt;This one physically hurts when I find it. Someone writes a solid test for Scenario A. Then they need a similar test for Scenario B, so they copy-paste and change a few things. But they forget to update the expected values. Now you have a test for Scenario B that's asserting Scenario A's expected output, and it's been passing for months because the assertion is loose enough to match both.&lt;/p&gt;

&lt;p&gt;We onboarded a fintech client where this was happening in their pricing calculation tests. Three variants of a discount test all expected the same final price, even though the discount percentages were different. Nobody noticed because the tests passed. The actual discount logic had a bug that made all three discounts produce the same result, which was wrong, but the tests said everything was fine.&lt;/p&gt;

&lt;p&gt;Copy-paste is fine. But you have to treat every pasted test as a new test. Read the expected values. Ask yourself if they make sense for this specific scenario. Better yet, calculate them independently rather than copying them from the original.&lt;/p&gt;

&lt;h3&gt;
  
  
  Flaky tests that nobody fixes
&lt;/h3&gt;

&lt;p&gt;Every team has them. Tests that fail randomly, pass on retry, and gradually erode everyone's trust in the suite. The typical lifecycle goes like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Test starts failing intermittently&lt;/li&gt;
&lt;li&gt;Someone adds a retry mechanism&lt;/li&gt;
&lt;li&gt;Retries mask the flakiness&lt;/li&gt;
&lt;li&gt;Team stops investigating failures because "it's probably just flaky"&lt;/li&gt;
&lt;li&gt;Real bugs start slipping through because failures get dismissed&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I've seen teams with 30-40 known flaky tests that they just re-run whenever CI fails. At that point, your CI pipeline isn't catching bugs. It's a slot machine that eventually gives you a green build if you pull the lever enough times.&lt;/p&gt;

&lt;p&gt;The painful truth is that flaky tests are usually flaky for a reason: timing dependencies, shared state between tests, hardcoded test data that conflicts with other tests, or assumptions about the order things load. These are fixable problems. They just require someone to sit down and actually diagnose them instead of adding another retry.&lt;/p&gt;

&lt;p&gt;At BetterQA, when we inherit a flaky suite, the first thing we do is quarantine the flaky tests. Move them out of the main pipeline. Run them separately. Then fix them one by one. It's tedious work but it's the only way to make the suite trustworthy again.&lt;/p&gt;

&lt;h3&gt;
  
  
  No separation between smoke, regression, and edge cases
&lt;/h3&gt;

&lt;p&gt;When every test has the same priority and runs in the same pipeline, you end up with 45-minute CI runs where critical path tests are mixed in with obscure edge case validations. A developer pushes a one-line CSS fix and waits 45 minutes to find out if it broke anything.&lt;/p&gt;

&lt;p&gt;The result is predictable: people start skipping CI, merging without waiting for tests, or just ignoring red builds because "it's probably that one slow test again."&lt;/p&gt;

&lt;p&gt;A healthy suite has layers. Smoke tests that run in under 5 minutes and cover the critical paths. Regression tests that run on merge to main. Edge case and exploratory tests that run nightly or on-demand. When everything is lumped together, nothing gets the attention it deserves.&lt;/p&gt;

&lt;h3&gt;
  
  
  Test data that's hardcoded and brittle
&lt;/h3&gt;

&lt;p&gt;Hardcoded IDs, specific usernames, dates that assume a certain timezone, URLs that point to a staging server that got decommissioned six months ago. I see all of these constantly.&lt;/p&gt;

&lt;p&gt;The worst case I encountered was a test suite that had a user's actual production email address hardcoded in 200+ tests. The tests were hitting a staging API, but if anyone accidentally pointed them at production, they'd spam a real customer with test emails. Beyond the safety issue, those tests broke every time the staging database got refreshed because the hardcoded user no longer existed.&lt;/p&gt;

&lt;p&gt;Test data should be created by the test, used by the test, and cleaned up by the test. If your test depends on something that already exists in the database, it's one environment reset away from failing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tests that verify implementation, not behavior
&lt;/h3&gt;

&lt;p&gt;This is a subtler problem but it kills test suite longevity. When tests are tightly coupled to implementation details (specific CSS selectors, internal component state, exact API response shapes), any refactoring breaks them even if the behavior is identical.&lt;/p&gt;

&lt;p&gt;I've watched teams avoid refactoring because "it would break too many tests." That's backwards. Tests should give you confidence to refactor. If they're blocking refactors, they're testing the wrong things.&lt;/p&gt;

&lt;p&gt;Test the behavior the user sees. The login form accepts credentials and redirects to the dashboard. The search returns relevant results. The export generates a file with the correct data. If you refactor the internals and those behaviors still work, your tests should still pass.&lt;/p&gt;

&lt;h3&gt;
  
  
  No traceability between tests and requirements
&lt;/h3&gt;

&lt;p&gt;This is the organizational problem underneath all the technical ones. When tests aren't linked to requirements, user stories, or bug reports, nobody knows which tests matter and which are leftovers from features that were redesigned or removed.&lt;/p&gt;

&lt;p&gt;We built &lt;a href="https://bugboard.co" rel="noopener noreferrer"&gt;BugBoard&lt;/a&gt; partly because of this problem. When you can see which tests are actually catching bugs versus which ones have been passing quietly for two years without ever failing, you start to understand the real health of your suite. A test that has never failed might be rock-solid validation of a stable feature. Or it might be testing nothing useful. Without traceability, you can't tell the difference.&lt;/p&gt;

&lt;h3&gt;
  
  
  How we fix this when onboarding clients
&lt;/h3&gt;

&lt;p&gt;When we take over a test suite, the process looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Audit pass&lt;/strong&gt;: read every test, tag it with what it actually validates, flag the ones with weak or missing assertions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quarantine flaky tests&lt;/strong&gt;: pull them out of the main pipeline, track them separately&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prioritize by risk&lt;/strong&gt;: map tests to features ranked by business impact, find the gaps where critical features have no coverage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kill the dead weight&lt;/strong&gt;: delete tests that test framework behavior, have no assertions, or duplicate other tests&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fix what remains&lt;/strong&gt;: stabilize the flaky tests, update hardcoded data, decouple from implementation details&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It's not glamorous work. It takes time. But the difference between a 2,000-test suite with 50% useful coverage and a 900-test suite with 95% useful coverage is enormous. The smaller suite runs faster, fails for real reasons, and actually catches bugs before they ship.&lt;/p&gt;

&lt;h3&gt;
  
  
  The uncomfortable math
&lt;/h3&gt;

&lt;p&gt;If you have 1,000 tests and 400 of them are noise, every developer on your team is waiting for those 400 useless tests to run on every CI build. Multiply that wait time by the number of builds per day, the number of developers, and the number of working days in a year. You're burning weeks of engineering time on tests that provide zero value.&lt;/p&gt;

&lt;p&gt;That's before you count the cognitive cost. When developers see tests fail and their first reaction is "it's probably flaky," you've already lost. The test suite has become background noise instead of a safety net.&lt;/p&gt;

&lt;h3&gt;
  
  
  Start with honesty
&lt;/h3&gt;

&lt;p&gt;The hardest part of fixing a test suite is admitting it needs fixing. Nobody wants to hear that the 2,000 tests they spent months writing are half useless. But the alternative is continuing to invest in something that gives you false confidence while bugs keep reaching production.&lt;/p&gt;

&lt;p&gt;If you want to see the patterns I've described in your own suite, start with one question: for each test, what specific bug would this catch? If you can't answer that clearly, the test needs work or removal.&lt;/p&gt;

&lt;p&gt;We write about testing patterns, QA team structure, and what we learn from client projects on our blog: &lt;a href="https://betterqa.co/blog" rel="noopener noreferrer"&gt;betterqa.co/blog&lt;/a&gt;&lt;/p&gt;

</description>
      <category>testing</category>
      <category>automation</category>
      <category>devops</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Your team is confusing bug severity with priority, and it's costing you sprints</title>
      <dc:creator>Tudor Brad</dc:creator>
      <pubDate>Thu, 09 Apr 2026 08:16:16 +0000</pubDate>
      <link>https://dev.to/tudorsss-betterqa/your-team-is-confusing-bug-severity-with-priority-and-its-costing-you-sprints-4jjc</link>
      <guid>https://dev.to/tudorsss-betterqa/your-team-is-confusing-bug-severity-with-priority-and-its-costing-you-sprints-4jjc</guid>
      <description>&lt;p&gt;I've sat through hundreds of sprint planning sessions where someone says "this is a P1" and someone else says "no, it's a sev-3" and then the whole room argues for fifteen minutes about a tooltip that renders wrong on Firefox. Nobody ships anything. The standup runs long. Half the team checks out mentally because they've had this exact argument before.&lt;/p&gt;

&lt;p&gt;The root problem is simple: most teams use "severity" and "priority" interchangeably, and that confusion creates real damage. Bugs get fixed in the wrong order. Critical issues sit in backlog while someone polishes a cosmetic fix that a stakeholder complained about in Slack.&lt;/p&gt;

&lt;p&gt;At &lt;a href="https://betterqa.co" rel="noopener noreferrer"&gt;BetterQA&lt;/a&gt;, we triage thousands of bugs across dozens of client projects every month. This confusion shows up constantly, and it's one of the first things we fix when onboarding a new team.&lt;/p&gt;

&lt;h3&gt;
  
  
  Severity is about impact, priority is about urgency
&lt;/h3&gt;

&lt;p&gt;That's the whole distinction. Once you internalize it, triage gets dramatically faster.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Severity&lt;/strong&gt; answers: how broken is this? How much damage does the bug cause to the system, the data, or the user's ability to do their job?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Priority&lt;/strong&gt; answers: how soon do we need to fix it? Given everything else on our plate, where does this land in the queue?&lt;/p&gt;

&lt;p&gt;These two axes are independent. They correlate sometimes, but treating them as the same thing is where teams lose sprint capacity.&lt;/p&gt;

&lt;h3&gt;
  
  
  The examples that make it click
&lt;/h3&gt;

&lt;p&gt;I use two examples when explaining this to new QA engineers, and they tend to stick.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Low severity, high priority: the CEO's bio typo.&lt;/strong&gt; Someone misspelled the CEO's name on the company About page. The system works perfectly fine. No functionality is broken. No data is corrupted. Severity? Low. But the CEO noticed it, sent a message to the VP of Product, and now three people are asking when it will be fixed. Priority? High. Fix it today.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;High severity, low priority: the edge case crash.&lt;/strong&gt; There's a bug where the app crashes if a user enters exactly 47 special characters into a phone number field during registration. The app completely dies. Severity? High, it's a full crash. But it affects roughly 0.1% of users in a flow that has a validation fallback anyway. Nobody has actually reported it in production. Priority? Low. Log it, schedule it for a future sprint, move on.&lt;/p&gt;

&lt;p&gt;If your bug tracker doesn't let you set these independently, you'll default to whatever field you have and lose the nuance. This is exactly why we built &lt;a href="https://bugboard.co" rel="noopener noreferrer"&gt;BugBoard&lt;/a&gt; with separate severity and priority fields. The distinction matters for triage, and collapsing them into one dimension forces bad decisions.&lt;/p&gt;

&lt;h3&gt;
  
  
  What severity levels actually look like
&lt;/h3&gt;

&lt;p&gt;I've seen teams use three levels, five levels, even seven. The number matters less than consistency. Here's a practical five-level scale that works across most projects:&lt;/p&gt;

&lt;h3&gt;
  
  
  Critical (sev-1)
&lt;/h3&gt;

&lt;p&gt;The system is down, data is being lost or corrupted, or a core workflow is completely blocked for all users. Payment processing fails. Login is broken. The database is returning errors. There is no workaround.&lt;/p&gt;

&lt;p&gt;If you're debating whether something is sev-1, ask: "Can users do the primary thing they came here to do?" If the answer is no, it's sev-1.&lt;/p&gt;

&lt;h3&gt;
  
  
  Major (sev-2)
&lt;/h3&gt;

&lt;p&gt;A significant feature is broken or behaving incorrectly, but the system is still usable. Users can work around it, but the workaround is painful or non-obvious. Think: search returns wrong results, file uploads fail intermittently, or a key report generates incorrect numbers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Moderate (sev-3)
&lt;/h3&gt;

&lt;p&gt;Something is clearly wrong but the impact is contained. A secondary feature misbehaves. A form doesn't validate one edge case properly. Sorting works on most columns but breaks on date fields. Users notice it but can still get their work done.&lt;/p&gt;

&lt;h3&gt;
  
  
  Minor (sev-4)
&lt;/h3&gt;

&lt;p&gt;Cosmetic issues, UI inconsistencies, or small deviations from the spec that don't affect functionality. A button is slightly misaligned. A success message uses the wrong shade of green. Text truncates awkwardly at one specific viewport width.&lt;/p&gt;

&lt;h3&gt;
  
  
  Trivial (sev-5)
&lt;/h3&gt;

&lt;p&gt;Issues so minor that most users would never notice them. A tooltip appears 200ms late. There's extra whitespace at the bottom of a page that only shows on one browser. The "about" link in the footer points to a slightly outdated version of the page.&lt;/p&gt;

&lt;h3&gt;
  
  
  What priority levels actually look like
&lt;/h3&gt;

&lt;p&gt;Priority is a business decision, not a technical one. That's why product managers, project leads, or client stakeholders typically set priority, while QA engineers set severity. The people closest to the technical impact assess severity. The people closest to the business impact assess priority.&lt;/p&gt;

&lt;h3&gt;
  
  
  Immediate (P1)
&lt;/h3&gt;

&lt;p&gt;Drop what you're doing and fix this now. The fix goes into the current sprint, possibly as a hotfix outside the normal release cycle. Reserved for situations where the bug is actively causing business damage: lost revenue, broken SLAs, security vulnerabilities being exploited.&lt;/p&gt;

&lt;h3&gt;
  
  
  High (P2)
&lt;/h3&gt;

&lt;p&gt;Fix this in the current sprint. It's important enough to bump something else out of the sprint if needed. Stakeholders are watching. Customers have noticed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Medium (P3)
&lt;/h3&gt;

&lt;p&gt;Schedule this for the next sprint or two. It needs to get done, but it's not urgent enough to disrupt current work. Most bugs land here, and that's fine.&lt;/p&gt;

&lt;h3&gt;
  
  
  Low (P4)
&lt;/h3&gt;

&lt;p&gt;Fix it when you have time. Put it in the backlog and revisit during grooming. If it never gets fixed because higher-priority work keeps coming in, that might be acceptable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Won't fix / defer (P5)
&lt;/h3&gt;

&lt;p&gt;The team acknowledges the bug exists but has decided not to fix it, at least not in the foreseeable future. Maybe the feature is being deprecated. Maybe the cost of fixing it outweighs the impact. Document the decision and move on.&lt;/p&gt;

&lt;h3&gt;
  
  
  The four quadrants that matter for triage
&lt;/h3&gt;

&lt;p&gt;When you separate severity and priority into two independent fields, you get a 2x2 matrix that makes triage decisions almost mechanical:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;High severity + high priority:&lt;/strong&gt; Fix immediately. System crash affecting many users, critical security hole, data corruption in a production workflow. This is your "all hands on deck" category.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;High severity + low priority:&lt;/strong&gt; Schedule carefully. The bug is technically severe but the real-world impact is low because of how rarely it occurs or because a workaround exists. Don't ignore it, but don't let it hijack your sprint either.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Low severity + high priority:&lt;/strong&gt; Fix fast, but keep perspective. The CEO's typo. The client's logo rendered in the wrong color. A cosmetic issue on a landing page right before a big marketing push. Quick fix, high visibility, low technical risk.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Low severity + low priority:&lt;/strong&gt; Backlog it. Minor UI polish, edge case behaviors that almost nobody encounters, small inconsistencies that don't affect usability. Groom these periodically and close the ones that are no longer relevant.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where teams actually lose time
&lt;/h3&gt;

&lt;p&gt;The damage isn't theoretical. I've watched it happen across projects.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 1: Everything is P1.&lt;/strong&gt; A product owner marks every bug as high priority because they want everything fixed. The dev team has thirty P1 tickets and no way to distinguish between a broken payment flow and a misaligned icon. So they pick based on what seems easiest, or what they were already working near. The truly critical bugs get fixed by accident, not by design.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 2: Severity drives priority by default.&lt;/strong&gt; The team uses a single field, or treats them as synonyms. A sev-1 crash that happens once a month in an internal admin tool gets treated with the same urgency as a sev-1 crash in the customer-facing checkout flow. One affects three people who already know the workaround. The other loses revenue every hour it's live.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 3: Nobody updates priority after initial triage.&lt;/strong&gt; A bug was P3 when it was filed two months ago. Since then, the feature it affects has become the primary onboarding flow for a new enterprise client. It's now effectively P1, but nobody re-triaged it. The new client hits it on day one.&lt;/p&gt;

&lt;h3&gt;
  
  
  How we handle this at BetterQA
&lt;/h3&gt;

&lt;p&gt;When we onboard a new client's QA process, one of the first things we audit is how they categorize bugs. More often than not, we find a single "priority" dropdown doing double duty, or severity levels that nobody on the team can define consistently.&lt;/p&gt;

&lt;p&gt;We standardize on two separate fields with clear definitions that the whole team agrees on. QA sets severity based on technical impact. Product sets priority based on business context. When the two conflict, that conflict is the conversation worth having in triage, not "is this a P1 or a P2?"&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://bugboard.co" rel="noopener noreferrer"&gt;BugBoard&lt;/a&gt;, we enforce this separation at the tool level. Every bug has both fields. Reports can be filtered and sorted by either dimension independently. When you look at your backlog and filter for "high severity, low priority," you get a clear view of the technical debt that's accumulating quietly. When you filter for "low severity, high priority," you see the political fires that need quick attention.&lt;/p&gt;

&lt;h3&gt;
  
  
  Practical steps to fix this on your team
&lt;/h3&gt;

&lt;p&gt;If your team is currently mixing these up, here's what I'd do:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Add both fields to your bug tracker.&lt;/strong&gt; If your tool only has one, add a custom field. Every bug gets both a severity and a priority rating.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Define who owns each field.&lt;/strong&gt; QA owns severity. Product or project management owns priority. If someone wants to change the other team's rating, that's a conversation, not a unilateral edit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Write down your definitions.&lt;/strong&gt; Put your severity scale and priority scale somewhere the whole team can reference. One page, plain language, with examples. Revisit it quarterly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Use the 2x2 in triage.&lt;/strong&gt; When reviewing new bugs, plot them mentally on the severity/priority grid. The quadrant tells you what to do. Stop debating feelings and start making decisions based on two clear dimensions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Re-triage periodically.&lt;/strong&gt; Priorities change. A P4 bug in January might be a P2 by March because the product roadmap shifted. Build re-triage into your grooming cadence.&lt;/p&gt;

&lt;h3&gt;
  
  
  It's a small distinction with a big payoff
&lt;/h3&gt;

&lt;p&gt;Getting severity and priority right won't make your bugs disappear. But it will make your triage meetings shorter, your sprint planning more accurate, and your team less frustrated. When everyone agrees on what "this is critical" actually means, you stop arguing about vocabulary and start fixing the right things in the right order.&lt;/p&gt;

&lt;p&gt;That's the whole point.&lt;/p&gt;

&lt;p&gt;For more on QA practices, bug reporting, and how independent testing teams handle triage at scale, check out the &lt;a href="https://betterqa.co/blog" rel="noopener noreferrer"&gt;BetterQA blog&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>testing</category>
      <category>security</category>
    </item>
    <item>
      <title>API testing with Cypress: the part most teams skip</title>
      <dc:creator>Tudor Brad</dc:creator>
      <pubDate>Thu, 09 Apr 2026 08:09:02 +0000</pubDate>
      <link>https://dev.to/tudorsss-betterqa/api-testing-with-cypress-the-part-most-teams-skip-2fea</link>
      <guid>https://dev.to/tudorsss-betterqa/api-testing-with-cypress-the-part-most-teams-skip-2fea</guid>
      <description>&lt;p&gt;I spend most of my working hours writing Cypress tests. UI flows, login forms, dashboards, the usual. But the tests that have saved me the most time and headaches over the past few years are the ones that never open a browser at all.&lt;/p&gt;

&lt;p&gt;They hit the API directly with &lt;code&gt;cy.request()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;And almost nobody writes them.&lt;/p&gt;

&lt;h3&gt;
  
  
  The bug that only the API knew about
&lt;/h3&gt;

&lt;p&gt;A few months ago I was testing a project management app for a client. The UI looked perfect. You could create a task, assign it, mark it done. All green in Cypress. Ship it.&lt;/p&gt;

&lt;p&gt;Except the API was returning a 500 on every third POST request when the task description contained special characters. The frontend was silently swallowing the error and showing a success toast anyway because the developer had wrapped everything in a try-catch that defaulted to "ok."&lt;/p&gt;

&lt;p&gt;The user would create a task, see a success message, and the task would simply not exist. No error. No feedback. Just gone.&lt;/p&gt;

&lt;p&gt;I caught it by accident when I added a &lt;code&gt;cy.request()&lt;/code&gt; test for the create endpoint. The UI tests had been green for weeks.&lt;/p&gt;

&lt;p&gt;That's the problem. If you only test through the UI, you're testing the frontend's ability to hide failures. You're not testing whether the backend actually works.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why cy.request() and not Postman?
&lt;/h3&gt;

&lt;p&gt;Fair question. At &lt;a href="https://betterqa.co" rel="noopener noreferrer"&gt;BetterQA&lt;/a&gt; we use both. Postman is great for exploratory API testing and for sharing collections with developers. But when I need API tests running in the same pipeline as my UI tests, using the same config, the same env variables, the same reporting, &lt;code&gt;cy.request()&lt;/code&gt; wins.&lt;/p&gt;

&lt;p&gt;No extra tooling. No separate runner. No "well the Postman tests passed in Newman but the Cypress tests failed" confusion.&lt;/p&gt;

&lt;p&gt;If your team already has Cypress installed, you have an API testing framework. You're just not using it yet.&lt;/p&gt;

&lt;h3&gt;
  
  
  The basics: hitting an endpoint and checking what comes back
&lt;/h3&gt;

&lt;p&gt;Here's what a real API test looks like. Nothing fancy.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Users API&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;returns a list of users with the expected shape&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;GET&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api/users&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;be&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;an&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;array&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;be&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;greaterThan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
      &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;have&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;property&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;have&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;property&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;email&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;creates a user and gets back a real ID&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;request&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api/users&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Test User&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`test-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;@example.com`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;201&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Test User&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;be&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;a&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;number&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the &lt;code&gt;Date.now()&lt;/code&gt; in the email. I learned this the hard way: if your test creates data, make it unique every run. Otherwise your second pipeline run fails with a "duplicate email" error and you waste 20 minutes debugging a test problem that isn't a test problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Authentication: the part people get stuck on
&lt;/h3&gt;

&lt;p&gt;Most real APIs need auth. Here are the two patterns I use constantly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bearer tokens (JWT, OAuth, etc.):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Protected endpoints&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;authToken&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="nf"&gt;before&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;request&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api/auth/login&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Cypress&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;env&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;TEST_USER_EMAIL&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Cypress&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;env&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;TEST_USER_PASSWORD&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;authToken&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;returns profile data with valid token&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;request&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;GET&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api/profile&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;Authorization&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;authToken&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;Cypress&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;env&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;TEST_USER_EMAIL&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;rejects requests with no token&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;request&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;GET&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api/profile&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;failOnStatusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;401&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;API key in a header:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;request&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;GET&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api/data&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;X-API-Key&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Cypress&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;env&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;API_KEY&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Keep your credentials in &lt;code&gt;cypress.env.json&lt;/code&gt; (and add that file to &lt;code&gt;.gitignore&lt;/code&gt; right now if you haven't). In CI, pass them as environment variables prefixed with &lt;code&gt;CYPRESS_&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;One thing that bites people: &lt;code&gt;failOnStatusCode: false&lt;/code&gt;. Without it, Cypress treats any non-2xx status as a test failure and throws. When you're intentionally testing a 401 or 404, you need this flag. I forget it about once a month.&lt;/p&gt;

&lt;h3&gt;
  
  
  Schema validation: the test that catches breaking changes
&lt;/h3&gt;

&lt;p&gt;This is where API tests really earn their keep. Backend developers change response structures. They rename a field from &lt;code&gt;created_at&lt;/code&gt; to &lt;code&gt;createdAt&lt;/code&gt;. They drop a property. They add a nested object where there used to be a string.&lt;/p&gt;

&lt;p&gt;Your UI might still work because JavaScript is forgiving. But your mobile client breaks. Or your integration partner's webhook stops parsing. Or the data is silently wrong.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user response has the required fields and types&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;GET&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api/users/1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;have&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;all&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;name&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;email&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;created_at&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;role&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;be&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;a&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;number&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;be&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;a&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/^&lt;/span&gt;&lt;span class="se"&gt;[^\s&lt;/span&gt;&lt;span class="sr"&gt;@&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+@&lt;/span&gt;&lt;span class="se"&gt;[^\s&lt;/span&gt;&lt;span class="sr"&gt;@&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;\.[^\s&lt;/span&gt;&lt;span class="sr"&gt;@&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+$/&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;role&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;be&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;oneOf&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;admin&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;editor&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This test takes about 50 milliseconds to run. It will catch a breaking API change before your users do. That's a trade-off I will take every single time.&lt;/p&gt;

&lt;p&gt;For bigger projects, look into &lt;code&gt;chai-json-schema&lt;/code&gt; for full JSON Schema validation. But honestly, the simple assertions above cover 80% of what I need.&lt;/p&gt;

&lt;h3&gt;
  
  
  Testing the unhappy paths
&lt;/h3&gt;

&lt;p&gt;Every junior tester writes tests for when things go right. The tests that matter are the ones for when things go wrong.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Error handling&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;returns 404 for a user that does not exist&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;request&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;GET&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api/users/999999&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;failOnStatusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;404&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;have&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;property&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;returns 400 when required fields are missing&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;request&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api/users&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;failOnStatusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;be&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;an&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;array&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I check three things on every error response: correct status code, an error message that makes sense, and that the body does not leak internal details (stack traces, database errors, file paths). You'd be surprised how many APIs return a full Node.js stack trace on a 500.&lt;/p&gt;

&lt;h3&gt;
  
  
  Combining API and UI tests
&lt;/h3&gt;

&lt;p&gt;This is where Cypress really shines compared to standalone API tools. You can set up data through the API and then verify it in the UI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;shows a newly created task on the dashboard&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Create data via API (fast, reliable)&lt;/span&gt;
  &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;request&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api/tasks&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;Authorization&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;authToken&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Fix login bug&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;high&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;taskId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Verify it shows up in the UI&lt;/span&gt;
    &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;visit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/dashboard&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Fix login bug&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;should&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;be.visible&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`[data-task-id="&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;taskId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"]`&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;should&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;exist&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern is faster and more reliable than creating data through the UI. Click-based setup is fragile. API-based setup gives you a known state in milliseconds.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mocking APIs with cy.intercept()
&lt;/h3&gt;

&lt;p&gt;Sometimes you need to test how the frontend handles a broken backend. That's where &lt;code&gt;cy.intercept()&lt;/code&gt; comes in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;shows an error message when the API is down&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;intercept&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;GET&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api/users&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Internal Server Error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;getUsers&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;visit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/users&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@getUsers&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Something went wrong&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;should&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;be.visible&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;shows empty state when there is no data&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;intercept&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;GET&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api/users&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
  &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;getUsers&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;visit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/users&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@getUsers&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;No users found&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;should&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;be.visible&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I use this to test every error state the designer put in the mockups. If there's an empty state in the Figma file, there should be a test that forces that state through &lt;code&gt;cy.intercept()&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Organizing your tests so they don't become a mess
&lt;/h3&gt;

&lt;p&gt;Once you have more than five or six API test files, structure matters.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cypress/
  e2e/
    api/
      users.cy.js
      auth.cy.js
      orders.cy.js
      payments.cy.js
    ui/
      login.cy.js
      dashboard.cy.js
  support/
    commands.js
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And pull repeated API calls into custom commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// cypress/support/commands.js&lt;/span&gt;
&lt;span class="nx"&gt;Cypress&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Commands&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;apiLogin&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;password&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;request&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api/auth/login&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;password&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;localStorage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setItem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;token&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// In your tests:&lt;/span&gt;
&lt;span class="nf"&gt;beforeEach&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;cy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apiLogin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;Cypress&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;env&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;TEST_USER_EMAIL&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nx"&gt;Cypress&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;env&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;TEST_USER_PASSWORD&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I create a custom command for every API operation I call more than twice. Login, create user, create resource, cleanup. This keeps test files short and readable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Running API tests in CI
&lt;/h3&gt;

&lt;p&gt;API tests are fast because they skip the browser rendering. A suite of 50 API tests finishes in under 10 seconds. Add them to your pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# GitHub Actions&lt;/span&gt;
&lt;span class="na"&gt;api-tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
  &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm install&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npx cypress run --spec "cypress/e2e/api/**"&lt;/span&gt;
      &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;CYPRESS_BASE_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.API_BASE_URL }}&lt;/span&gt;
        &lt;span class="na"&gt;CYPRESS_TEST_USER_EMAIL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.TEST_EMAIL }}&lt;/span&gt;
        &lt;span class="na"&gt;CYPRESS_TEST_USER_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.TEST_PASSWORD }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run them on every PR. They're cheap and they catch real bugs.&lt;/p&gt;

&lt;h3&gt;
  
  
  When Cypress is not the right API testing tool
&lt;/h3&gt;

&lt;p&gt;I'm not going to pretend Cypress is always the answer. Use Postman or a dedicated API framework when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're testing APIs that have no frontend at all&lt;/li&gt;
&lt;li&gt;You need to generate load or stress test endpoints&lt;/li&gt;
&lt;li&gt;You want API documentation generated from your test definitions&lt;/li&gt;
&lt;li&gt;Your API tests need to run outside a Node.js environment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For everything else, especially when your team already has Cypress in the repo, just write the &lt;code&gt;cy.request()&lt;/code&gt; tests. You already have the tool. Use it.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I'd add to any test suite tomorrow
&lt;/h3&gt;

&lt;p&gt;If I had to pick three API tests to add to a project that has zero, these are the ones:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Health check test&lt;/strong&gt; - Hit &lt;code&gt;/api/health&lt;/code&gt; or your main endpoint. Confirm it returns 200. This is your canary. If this fails, something is very wrong.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auth rejection test&lt;/strong&gt; - Hit a protected endpoint with no token. Confirm you get 401, not 200. You would not believe how many APIs return data to unauthenticated requests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schema test on your most-used endpoint&lt;/strong&gt; - Pick the endpoint the frontend calls most. Assert every field name and type. This catches breaking changes before they reach production.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Three tests. Maybe 15 minutes to write. They'll save you hours.&lt;/p&gt;




&lt;p&gt;We write tests like these on client projects every week at BetterQA, usually alongside Postman collections and full E2E suites. If you want to read more about how we approach testing, check out &lt;a href="https://betterqa.co/blog" rel="noopener noreferrer"&gt;betterqa.co/blog&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>testing</category>
      <category>automation</category>
      <category>devops</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Selenium vs Cypress: what we actually use and why</title>
      <dc:creator>Tudor Brad</dc:creator>
      <pubDate>Thu, 09 Apr 2026 08:08:57 +0000</pubDate>
      <link>https://dev.to/tudorsss-betterqa/selenium-vs-cypress-what-we-actually-use-and-why-od6</link>
      <guid>https://dev.to/tudorsss-betterqa/selenium-vs-cypress-what-we-actually-use-and-why-od6</guid>
      <description>&lt;p&gt;We have about 50 engineers across 24 countries working on client QA projects. On any given week, some of those projects run Cypress, some run Selenium, and a few run both. We did not pick sides. The client's stack, timeline, and constraints pick for us.&lt;/p&gt;

&lt;p&gt;This is what we have learned from running both frameworks in production across dozens of projects. Not a feature matrix you can find on either tool's website. The actual pains and gains we deal with.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where Cypress wins and why we reach for it
&lt;/h3&gt;

&lt;p&gt;Cypress is the faster path to a working test suite on most modern web apps. That is the single biggest gain.&lt;/p&gt;

&lt;p&gt;On a React or Vue SPA, a new tester can have a Cypress test running within an hour of cloning the repo. Install it, write a spec, run it. No driver downloads, no browser binaries to manage, no WebDriver protocol quirks. The test runner shows you what happened at each step with DOM snapshots. When a test fails, you can time-travel through the state to see exactly what went wrong.&lt;/p&gt;

&lt;p&gt;For teams that write JavaScript and build SPAs, Cypress removes a pile of friction:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No driver management.&lt;/strong&gt; Selenium needs ChromeDriver, GeckoDriver, etc., and they break every time Chrome auto-updates. Cypress bundles its own browser management.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic waiting.&lt;/strong&gt; Cypress retries assertions until they pass or time out. In Selenium, you write explicit waits or sleep statements, and you still get flaky tests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Network stubbing built in.&lt;/strong&gt; Intercepting API calls, mocking responses, testing error states: all native. In Selenium, you need a proxy tool like BrowserMob or mitmproxy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Readable test output.&lt;/strong&gt; The Test Runner GUI is genuinely useful for debugging. Selenium's output is a stack trace and a prayer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We use Cypress on most greenfield SPA projects unless the client has a specific reason not to. It is the default recommendation when someone asks "what should we automate with?"&lt;/p&gt;

&lt;h3&gt;
  
  
  Where Cypress hurts
&lt;/h3&gt;

&lt;p&gt;Here is the part that Cypress's marketing does not put on the homepage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Single browser tab only.&lt;/strong&gt; Cypress runs inside the browser. It cannot open a second tab. If your app opens a link in a new tab, sends you to an OAuth provider in another window, or does anything involving multiple browser contexts, you are stuck. We have had to rewrite application code to work around this on two separate client projects. That is not a testing framework problem, that is a testing framework creating an application problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cross-origin is painful.&lt;/strong&gt; Cypress historically blocked cross-origin navigation entirely. They added &lt;code&gt;cy.origin()&lt;/code&gt; to handle it, but it is clunky. If your login flow redirects through an identity provider on a different domain, expect to spend time fighting Cypress rather than testing your app.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;JavaScript only.&lt;/strong&gt; Your test code must be JavaScript or TypeScript. If the team writes Python or Java and nobody knows JS, Cypress is not "easy to learn." It is easy to learn &lt;em&gt;if you already know the language it requires.&lt;/em&gt; We have had QA engineers comfortable with Python spend weeks getting productive in Cypress because the language was the barrier, not the framework.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No mobile testing.&lt;/strong&gt; Cypress tests web browsers. Period. If you need to test a native mobile app, or even a responsive site in an actual mobile browser, you need a different tool. We pair Cypress with Appium on projects that have both web and mobile, which means maintaining two frameworks anyway.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;iframes are a headache.&lt;/strong&gt; Cypress and iframes have a long, troubled history. The &lt;code&gt;cy.iframe()&lt;/code&gt; command from community plugins works sometimes. Payment forms (Stripe, Braintree) that embed in iframes are consistently annoying to test with Cypress.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No parallel by default.&lt;/strong&gt; Cypress's free tier runs tests sequentially. Parallel execution requires Cypress Cloud (paid) or a third-party orchestrator. On a project with 400+ tests, sequential runs took over 40 minutes. That kills CI feedback loops.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where Selenium wins and why it survives
&lt;/h3&gt;

&lt;p&gt;Selenium is 20+ years old and looks it. The API is verbose. The documentation sprawls across multiple projects. Setting up a grid for parallel execution is an infrastructure project in itself. Nobody loves writing Selenium tests.&lt;/p&gt;

&lt;p&gt;But Selenium handles things Cypress cannot:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Any browser, any language.&lt;/strong&gt; Java, Python, C#, Ruby, JavaScript, Kotlin. Chrome, Firefox, Safari, Edge, even IE if you have been cursed. A QA team can use whatever language they already know.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multiple tabs and windows.&lt;/strong&gt; &lt;code&gt;driver.switchTo().window()&lt;/code&gt; just works. OAuth flows, popup windows, payment redirects: all testable without workarounds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-origin is not special.&lt;/strong&gt; Selenium controls the browser from outside. It does not care what domain you navigate to.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mobile testing via Appium.&lt;/strong&gt; Appium is built on the WebDriver protocol. Skills and patterns transfer directly from Selenium to Appium. Your page object models work in both.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mature ecosystem.&lt;/strong&gt; Selenium Grid, Docker images, cloud providers (BrowserStack, Sauce Labs, LambdaTest) all support Selenium natively. The infrastructure is battle-tested.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Non-browser automation.&lt;/strong&gt; With Appium's desktop drivers, you can automate Windows and macOS desktop apps using the same WebDriver API. Cypress cannot touch anything outside a browser.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We use Selenium on projects with complex auth flows, multi-window interactions, legacy browser requirements, or mixed web-and-mobile testing needs. It is also our choice when the QA team already has Java or Python expertise and there is no budget to retrain.&lt;/p&gt;

&lt;h3&gt;
  
  
  The pains we live with on Selenium
&lt;/h3&gt;

&lt;p&gt;Selenium's problems are real and we deal with them weekly:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Flaky tests from timing issues.&lt;/strong&gt; Selenium does not auto-wait. You write explicit waits, implicit waits, fluent waits. You still get &lt;code&gt;StaleElementReferenceException&lt;/code&gt; at 2 AM in CI. Every Selenium project accumulates a utility class of retry helpers, and every team writes them slightly differently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Driver version mismatches.&lt;/strong&gt; Chrome 124 ships, ChromeDriver 124 is not ready yet, CI breaks. Selenium Manager (added in Selenium 4.6) helps, but we still see this on projects with locked-down CI environments that cannot auto-download drivers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verbose test code.&lt;/strong&gt; A simple "click this button and check the text" test is 15 lines in Selenium and 3 lines in Cypress. Over hundreds of tests, that verbosity adds up. Code reviews take longer. New team members need more ramp-up time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Grid management overhead.&lt;/strong&gt; Running Selenium Grid (even with Docker) is operational work. Someone has to maintain the images, handle node scaling, debug session allocation. Cloud providers solve this but cost money.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No built-in visual feedback.&lt;/strong&gt; When a Selenium test fails, you get a stack trace. Maybe a screenshot if you configured the teardown to capture one. There is no interactive debugger, no time-travel, no DOM snapshot. You read logs and re-run.&lt;/p&gt;

&lt;h3&gt;
  
  
  What about Playwright?
&lt;/h3&gt;

&lt;p&gt;We would be dishonest if we did not mention Playwright here. Microsoft's framework has taken over a significant chunk of new projects since 2023. It handles multi-tab, cross-origin, and multiple browsers natively. It auto-waits like Cypress. It supports JavaScript, TypeScript, Python, Java, and C#.&lt;/p&gt;

&lt;p&gt;On new projects where the team has no existing framework investment, we now recommend Playwright over both Selenium and Cypress more often than not. But Playwright is not the point of this article, and the reality is that most of our active client projects still run Selenium or Cypress because switching frameworks mid-project rarely makes business sense.&lt;/p&gt;

&lt;h3&gt;
  
  
  What we built to deal with both
&lt;/h3&gt;

&lt;p&gt;One problem we kept hitting: QA engineers who were strong at manual testing but struggled to write automation code in either framework. We built &lt;a href="https://chromewebstore.google.com/detail/betterqa-flows/bkjgfhiglabncnhpejmpjjhagkbkamfp" rel="noopener noreferrer"&gt;Flows&lt;/a&gt;, a Chrome extension that records browser interactions visually and exports them as executable tests.&lt;/p&gt;

&lt;p&gt;Flows does not replace either framework. It gives manual testers a way to create automated tests without writing code, and it gives automation engineers a starting point they can refine. When a recorded flow captures a complex user journey, the engineer can export it and clean it up rather than writing every step from scratch.&lt;/p&gt;

&lt;p&gt;We built it because we were tired of the same bottleneck on every project: too many manual test cases, too few automation engineers, and a backlog of "we should automate this" tickets that never got done.&lt;/p&gt;

&lt;h3&gt;
  
  
  How we decide on each project
&lt;/h3&gt;

&lt;p&gt;Our actual decision process is not complicated:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pick Cypress when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The app is a JavaScript/TypeScript SPA&lt;/li&gt;
&lt;li&gt;The team knows JS&lt;/li&gt;
&lt;li&gt;There are no multi-tab or cross-origin flows&lt;/li&gt;
&lt;li&gt;No mobile testing requirement&lt;/li&gt;
&lt;li&gt;The client wants fast CI feedback on a small-to-medium test suite&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pick Selenium when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The team knows Java, Python, or C# and does not want to learn JS&lt;/li&gt;
&lt;li&gt;The app has multi-window flows, OAuth redirects, or iframe-heavy payment forms&lt;/li&gt;
&lt;li&gt;Mobile testing is also needed (Appium integration)&lt;/li&gt;
&lt;li&gt;The client requires Safari or legacy browser coverage&lt;/li&gt;
&lt;li&gt;There is an existing Selenium suite that works&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Consider Playwright when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Starting fresh with no existing framework&lt;/li&gt;
&lt;li&gt;Need multi-browser, multi-tab, and cross-origin support&lt;/li&gt;
&lt;li&gt;Team can work in JS/TS, Python, or Java&lt;/li&gt;
&lt;li&gt;The client is open to a newer tool&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use Flows when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Manual testers need to contribute to automation&lt;/li&gt;
&lt;li&gt;There is a large backlog of manual test cases to convert&lt;/li&gt;
&lt;li&gt;The team wants visual test recording regardless of the target framework&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Neither framework solves your real problem
&lt;/h3&gt;

&lt;p&gt;The honest answer nobody wants to hear: the framework choice matters less than most teams think. We have seen terrible test suites in Cypress and excellent ones in Selenium. The difference was never the tool. It was whether the team had clear test strategies, maintained their tests, and ran them consistently.&lt;/p&gt;

&lt;p&gt;A Cypress suite that nobody maintains after sprint 3 is worse than no automation at all. It gives false confidence. A Selenium suite with proper page objects, good waits, and regular maintenance catches real bugs in production.&lt;/p&gt;

&lt;p&gt;Pick the tool that fits your team and your app. Invest the time you save on setup into writing tests that actually matter. If you are spending more time debating frameworks than writing tests, you have already lost.&lt;/p&gt;




&lt;p&gt;We write about testing from the perspective of a team that does it for a living across dozens of client projects. More at &lt;a href="https://betterqa.co/blog" rel="noopener noreferrer"&gt;betterqa.co/blog&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>testing</category>
      <category>automation</category>
      <category>devops</category>
      <category>webdev</category>
    </item>
    <item>
      <title>We built an accessibility tool because spreadsheet audits were killing us</title>
      <dc:creator>Tudor Brad</dc:creator>
      <pubDate>Thu, 09 Apr 2026 07:45:28 +0000</pubDate>
      <link>https://dev.to/tudorsss-betterqa/we-built-an-accessibility-tool-because-spreadsheet-audits-were-killing-us-1km9</link>
      <guid>https://dev.to/tudorsss-betterqa/we-built-an-accessibility-tool-because-spreadsheet-audits-were-killing-us-1km9</guid>
      <description>&lt;p&gt;There's a specific kind of despair that comes from opening the ninth spreadsheet in a WCAG audit, the one you're pretty sure somebody duplicated from the wrong version two weeks ago, and finding that step 14 of the login journey is marked "Fail" in your copy and "Not Tested" in the reviewer's copy.&lt;/p&gt;

&lt;p&gt;Which one is right? Nobody knows. The tester who originally logged it is on PTO. The screenshot is somewhere in Slack, probably in a thread that got buried under a deployment argument.&lt;/p&gt;

&lt;p&gt;That was us, about two years ago, during a healthcare accessibility audit in the US. The client was strict, the regulations were strict, everything was strict except our tooling, which was held together with Google Sheets, manual dates, and hope.&lt;/p&gt;

&lt;p&gt;We had nine spreadsheets open. One tracked testers. One tracked severity. One tracked notes. One was apparently a backup of another one, but with different data. Screenshots lived in Slack channels, sometimes in DMs, sometimes attached to Jira tickets that referenced a different version of the WCAG criteria.&lt;/p&gt;

&lt;p&gt;And then we found the conflicting reports. Same user flow, same step, two different testers, two different results. One said Fail with a note about missing alt text. The other said Not Tested. Both had been submitted to the client in the same week.&lt;/p&gt;

&lt;p&gt;That was the moment we stopped patching the process and started building something.&lt;/p&gt;

&lt;h3&gt;
  
  
  The tool is called Auditi
&lt;/h3&gt;

&lt;p&gt;It lives at &lt;a href="https://auditi.ro" rel="noopener noreferrer"&gt;auditi.ro&lt;/a&gt;. We built it at &lt;a href="https://betterqa.co" rel="noopener noreferrer"&gt;BetterQA&lt;/a&gt; because nothing else matched how we actually test accessibility: by user journeys, broken into steps, with everything traceable back to a specific tester, date, platform, and WCAG criterion.&lt;/p&gt;

&lt;p&gt;The core idea is simple. You model journeys the way a user experiences them. Login flow. Checkout flow. Onboarding. Each journey has steps. Each step gets an audit result: pass, fail, or not applicable. Every result has a tester name, severity, notes, evidence files, and a timestamp.&lt;/p&gt;

&lt;p&gt;That sounds obvious. It isn't. In spreadsheet world, you're tracking all of that across columns, tabs, and files. Somebody renames a column. Somebody adds rows in the middle. Somebody filters by severity and forgets to unfilter before sending the report. I've watched an experienced QA engineer spend forty minutes rebuilding a pivot table that broke because Excel decided to reinterpret dates.&lt;/p&gt;

&lt;p&gt;Auditi gives you filters by journey, tester, status, severity, platform, WCAG level, device, and date. If you've ever tried to find "that one iOS Safari fail from last Tuesday" in a spreadsheet, you understand why this matters.&lt;/p&gt;

&lt;h3&gt;
  
  
  What we actually built
&lt;/h3&gt;

&lt;p&gt;Assignment dialogs and review queues, so work gets distributed without a Slack message chain. Pass/fail/N-A toggles per step, because that's the atomic unit of an accessibility audit. Notifications for deadlines and invites, because relying on people to check a spreadsheet daily doesn't work.&lt;/p&gt;

&lt;p&gt;Then analytics. Pass rate over time. A WCAG compliance matrix. Breakdown by tester, by platform, by severity. This is the part that managers actually care about, and the part that's almost impossible to maintain in a spreadsheet without a dedicated person updating charts.&lt;/p&gt;

&lt;p&gt;Reports export to Excel, PDF, and CSV. We kept that because the people who receive accessibility reports often live in those formats. Auditi generates Overview, Detailed, and Matrix reports.&lt;/p&gt;

&lt;p&gt;We also added an AI-powered Smart Report that produces an executive summary, scores by WCAG level, flags top issues by priority, and suggests fixes. I'll be honest about this: AI summarization is useful here because it's compressing structured data, not making judgment calls. The tester still decides what passes and what fails. The AI just writes the summary you'd otherwise spend an hour drafting.&lt;/p&gt;

&lt;h3&gt;
  
  
  If you've tried to make a React app WCAG compliant
&lt;/h3&gt;

&lt;p&gt;Here's where I want to talk to the developers reading this, because the auditing side is only half the problem. The other half is actually fixing things.&lt;/p&gt;

&lt;p&gt;We run thirteen products in the BetterQA ecosystem. Different stacks: Vite/React SPAs, Next.js apps, a Laravel app, a WordPress site. Earlier this year we decided to do an accessibility sweep across all of them using our own scanner tool, which runs axe-core via Playwright.&lt;/p&gt;

&lt;p&gt;The results were humbling.&lt;/p&gt;

&lt;p&gt;Eight of our thirteen sites had accessibility scores below 60. The single biggest offender? Color contrast. Specifically, Tailwind's &lt;code&gt;purple-400&lt;/code&gt; on a white background.&lt;/p&gt;

&lt;p&gt;Every Vite/React SPA in our ecosystem used &lt;code&gt;text-purple-400&lt;/code&gt; for links, badges, labels, secondary text. It's a nice color. It also has a contrast ratio of about 3.3:1 against white. WCAG AA requires 4.5:1 for normal text. We were failing dozens of contrast checks per page, across eight different sites, and nobody had noticed because the pages looked fine to us.&lt;/p&gt;

&lt;p&gt;The fix: switch to &lt;code&gt;purple-600&lt;/code&gt; (#9333ea), which gives you 4.6:1. Just barely over the threshold, but it passes. We made the change across all eight sites. Some of them needed 24 individual class updates. One site, BetterFlow (a Laravel/Blade app), had the same pattern in Blade templates.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight css"&gt;&lt;code&gt;&lt;span class="c"&gt;/* Before: 3.3:1 contrast - fails WCAG AA */&lt;/span&gt;
&lt;span class="nc"&gt;.text-purple-400&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;color&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;#a855f7&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;/* After: 4.6:1 contrast - passes WCAG AA */&lt;/span&gt;
&lt;span class="nc"&gt;.text-purple-600&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;color&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;#9333ea&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That got us from the 50s to the 70s and 80s in accessibility scores. But it only caught the low-hanging fruit.&lt;/p&gt;

&lt;h3&gt;
  
  
  The deeper fixes taught us more
&lt;/h3&gt;

&lt;p&gt;After the color contrast sweep, we went deeper. Sites like jrny.ro had icon-only buttons with no accessible name. Three buttons that a screen reader would announce as just "button." Fix: add &lt;code&gt;aria-label&lt;/code&gt; attributes.&lt;/p&gt;

&lt;p&gt;On menute.ro, we found eight form inputs and selects with no labels. A sighted user sees the placeholder text and understands the field. A screen reader user hears nothing useful. Fix: &lt;code&gt;aria-label&lt;/code&gt; on each input.&lt;/p&gt;

&lt;p&gt;The one that taught us the most was nis2manager.ro. The site uses CSS custom properties for its primary color. The original &lt;code&gt;--primary&lt;/code&gt; value was set to an oklch lightness of 0.65. Changing it to 0.48 fixed over seventy contrast violations in one line of CSS. Seventy. From a single variable change.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight css"&gt;&lt;code&gt;&lt;span class="c"&gt;/* One variable, seventy fixes */&lt;/span&gt;
&lt;span class="nt"&gt;--primary&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nt"&gt;oklch&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="err"&gt;0&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="err"&gt;48&lt;/span&gt; &lt;span class="err"&gt;0&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="err"&gt;2&lt;/span&gt; &lt;span class="err"&gt;270&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;  &lt;span class="c"&gt;/* was 0.65 */&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the lesson we keep coming back to: if your design system uses CSS custom properties or Tailwind theme colors, check the contrast of your base tokens first. You can hunt individual elements for hours, or you can fix the source and watch dozens of violations disappear.&lt;/p&gt;

&lt;h3&gt;
  
  
  What automated tools actually catch
&lt;/h3&gt;

&lt;p&gt;I want to be direct about this because I've seen too many articles claim that automated accessibility testing solves the problem. It doesn't. Not even close.&lt;/p&gt;

&lt;p&gt;Automated tools like axe-core catch maybe 30-40% of WCAG issues. They're good at color contrast, missing alt text, missing form labels, duplicate IDs, and broken ARIA attributes. They're bad at everything that requires context: whether alt text is actually meaningful, whether focus order makes sense, whether a custom widget is operable with a keyboard, whether content is understandable when read linearly by a screen reader.&lt;/p&gt;

&lt;p&gt;WCAG has roughly 80 success criteria across levels A, AA, and AAA. Automated tools can reliably check maybe 25-30 of them. The rest need a human who understands the user flow, the intent of the content, and what the experience is like without a mouse.&lt;/p&gt;

&lt;p&gt;That's why Auditi is structured around human-driven audits with journeys and steps, not around automated scan results. The automation is useful for catching regressions. It's not a substitute for a tester who actually navigates the site with a screen reader.&lt;/p&gt;

&lt;h3&gt;
  
  
  What we got wrong
&lt;/h3&gt;

&lt;p&gt;A few things, since we're being honest.&lt;/p&gt;

&lt;p&gt;The first version of Auditi was too complex. We modeled every WCAG criterion as a separate audit point, which meant testers had to click through dozens of criteria per step. Most of those were not applicable. We simplified it to let testers mark what matters and skip the rest.&lt;/p&gt;

&lt;p&gt;We also underestimated how important the export format is. Early exports were clean but didn't match what compliance officers expected. We had to add specific report layouts that mapped to the documentation formats our healthcare and government clients were already using.&lt;/p&gt;

&lt;p&gt;And our own ecosystem sweep revealed that we'd been shipping inaccessible products while building an accessibility tool. That stung. We fixed it, but it's a good reminder that building a tool and actually using it consistently are two different things.&lt;/p&gt;

&lt;h3&gt;
  
  
  The honest numbers
&lt;/h3&gt;

&lt;p&gt;Our ecosystem scores before and after the sweep:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Site&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;th&gt;Primary fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;betterqa.co&lt;/td&gt;
&lt;td&gt;54&lt;/td&gt;
&lt;td&gt;84&lt;/td&gt;
&lt;td&gt;Plugin color updates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;betterflow.eu&lt;/td&gt;
&lt;td&gt;55&lt;/td&gt;
&lt;td&gt;84&lt;/td&gt;
&lt;td&gt;24 Blade template fixes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;auditi.ro&lt;/td&gt;
&lt;td&gt;54&lt;/td&gt;
&lt;td&gt;82&lt;/td&gt;
&lt;td&gt;purple-400 to purple-600&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;electricworks.ro&lt;/td&gt;
&lt;td&gt;53&lt;/td&gt;
&lt;td&gt;80&lt;/td&gt;
&lt;td&gt;Tailwind primary classes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;psysign.ro&lt;/td&gt;
&lt;td&gt;53&lt;/td&gt;
&lt;td&gt;80&lt;/td&gt;
&lt;td&gt;Same pattern&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nis2manager.ro&lt;/td&gt;
&lt;td&gt;51&lt;/td&gt;
&lt;td&gt;76&lt;/td&gt;
&lt;td&gt;CSS custom property (one-liner)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;factos.ro&lt;/td&gt;
&lt;td&gt;47&lt;/td&gt;
&lt;td&gt;72&lt;/td&gt;
&lt;td&gt;Same pattern&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Not perfect scores. Not even close. But a 25-30 point jump across eight sites, and a process we can repeat.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where this matters for your stack
&lt;/h3&gt;

&lt;p&gt;If you're running a React or Vite SPA and you haven't run axe-core against it, do that first. You'll probably find contrast issues, missing labels, and button-name violations. Those are fixable in an afternoon.&lt;/p&gt;

&lt;p&gt;After that, the harder work begins. Keyboard navigation. Focus management in modals and dynamic content. Screen reader announcements for state changes. That's where spreadsheets fall apart and you need actual audit tracking.&lt;/p&gt;

&lt;p&gt;We built Auditi because we couldn't do that work well with the tools we had. It's at &lt;a href="https://auditi.ro" rel="noopener noreferrer"&gt;auditi.ro&lt;/a&gt; if you want to look at it.&lt;/p&gt;

&lt;p&gt;For more about how we approach QA across different domains, there's the &lt;a href="https://betterqa.co/blog" rel="noopener noreferrer"&gt;BetterQA blog&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>testing</category>
      <category>ai</category>
      <category>automation</category>
      <category>a11y</category>
    </item>
    <item>
      <title>Your staging environment is lying to you</title>
      <dc:creator>Tudor Brad</dc:creator>
      <pubDate>Thu, 09 Apr 2026 07:45:24 +0000</pubDate>
      <link>https://dev.to/tudorsss-betterqa/your-staging-environment-is-lying-to-you-11em</link>
      <guid>https://dev.to/tudorsss-betterqa/your-staging-environment-is-lying-to-you-11em</guid>
      <description>&lt;p&gt;I got a call from a client on a Tuesday morning. Their checkout flow was broken in production. Users couldn't complete purchases. Revenue was bleeding.&lt;/p&gt;

&lt;p&gt;The thing is, their staging regression suite had passed. Every test green. The deployment went through without a hitch. And yet real users were hitting a payment confirmation page that spun forever, because a third-party webhook URL had been updated in production but not in the staging environment config.&lt;/p&gt;

&lt;p&gt;Their regression tests checked that the checkout flow worked. They didn't check that the checkout flow worked with the actual production webhook endpoint, because staging had its own endpoint, and that one was fine.&lt;/p&gt;

&lt;p&gt;This is not a rare story. I run QA operations across teams in 24 countries, and this exact pattern shows up every few weeks. A team invests serious effort into staging regression tests, those tests pass, and production breaks anyway. Not because the tests were wrong, but because the tests were answering the wrong question.&lt;/p&gt;

&lt;h3&gt;
  
  
  The "test it in staging" mindset
&lt;/h3&gt;

&lt;p&gt;Most dev teams have a version of this workflow. Code gets written. It goes through code review. It lands in staging. Someone runs the test suite. If it's green, it ships.&lt;/p&gt;

&lt;p&gt;The problem is that staging is a simulation. It's supposed to mirror production, but it never fully does. Different data volumes. Different third-party configurations. Different network conditions. Sometimes different infrastructure entirely.&lt;/p&gt;

&lt;p&gt;When you make staging the primary place where quality gets verified, you've placed a bet that your simulation is accurate enough to catch real problems. And that bet fails more often than anyone likes to admit.&lt;/p&gt;

&lt;p&gt;I've seen a team lose three days debugging a production outage caused by a database migration that worked perfectly in staging against 500 test records but locked the table for 40 minutes against 2.3 million production records. Their migration test passed. The test was useless because it tested the wrong scale.&lt;/p&gt;

&lt;h3&gt;
  
  
  What regression tests actually verify
&lt;/h3&gt;

&lt;p&gt;Let me be specific about what regression tests in staging do well. They verify that previously working features still work after new code is introduced. If your login page worked last sprint and still works this sprint, that's regression testing doing its job.&lt;/p&gt;

&lt;p&gt;But here's what regression tests in staging don't verify:&lt;/p&gt;

&lt;p&gt;They don't verify that new features work under real user conditions. They check happy paths and known edge cases, not the creative ways actual humans interact with your product.&lt;/p&gt;

&lt;p&gt;They don't verify that your environment configuration matches production. Staging has its own secrets, its own endpoints, its own feature flags. Any mismatch is invisible to your test suite.&lt;/p&gt;

&lt;p&gt;They don't verify user workflows end-to-end across service boundaries. A user doesn't click one button and stop. They navigate through five screens, interact with three services, hit two payment providers, and receive an email. Your regression tests probably don't cover that full chain.&lt;/p&gt;

&lt;p&gt;And they certainly don't verify what happens when things go wrong. What does your app do when the payment provider returns a timeout instead of a success? Your staging tests probably don't simulate that.&lt;/p&gt;

&lt;h3&gt;
  
  
  The real cost of late discovery
&lt;/h3&gt;

&lt;p&gt;Bugs found in production cost 10-100x more to fix than bugs found during development. That's not a made-up number. IBM published this data decades ago and it's been validated repeatedly since.&lt;/p&gt;

&lt;p&gt;But the dollar cost isn't even the worst part. The worst part is the context switch. When a production bug surfaces, the developer who wrote the code three weeks ago has to stop what they're doing, reload all the context for that feature, reproduce the issue, fix it, and ship a hotfix. That developer loses half a day minimum, and whatever they were working on gets delayed.&lt;/p&gt;

&lt;p&gt;Multiply this by the average number of production bugs per sprint and you're looking at a real drag on velocity that nobody tracks because it looks like "unplanned work" in the sprint metrics.&lt;/p&gt;

&lt;p&gt;One client tracked this explicitly for a quarter. They found that production bug fixes consumed 22% of their engineering capacity. Nearly a quarter of their team's time was spent fixing things that should have been caught before release.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where bugs actually come from
&lt;/h3&gt;

&lt;p&gt;When we do root cause analysis on production bugs that passed staging regression, the same categories come up repeatedly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Environment differences.&lt;/strong&gt; Config values, feature flags, API endpoints, database sizes. Staging says yes, production says no.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Untested user workflows.&lt;/strong&gt; The regression suite tests individual features. Nobody tested the workflow where a user starts on mobile, switches to desktop midway through a multi-step form, and submits. That workflow broke because session handling differed between the two.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Integration timing.&lt;/strong&gt; Service A calls Service B. In staging, Service B responds in 50ms. In production under load, Service B responds in 3 seconds. The calling code had a 2-second timeout that nobody tested against realistic latency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data shape surprises.&lt;/strong&gt; Your tests use clean, well-formed test data. Production data has nulls where you don't expect them, Unicode characters in names, addresses with 8 lines, phone numbers with country codes your validation doesn't handle.&lt;/p&gt;

&lt;p&gt;None of these are caught by running the same regression suite one more time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Testing earlier, not just testing more
&lt;/h3&gt;

&lt;p&gt;The fix isn't to write more regression tests and run them more often. The fix is to test different things at different stages.&lt;/p&gt;

&lt;p&gt;In development, before code even reaches staging, you should be running integration tests against realistic data. Not the full suite, just the tests relevant to the change being made. If a developer changes the checkout flow, they should run the checkout integration tests locally, with production-scale data if possible, before pushing the PR.&lt;/p&gt;

&lt;p&gt;During code review, someone should be asking: "What user workflow does this change affect, and have we tested that workflow end-to-end?" Not "does this function return the right value," but "can a user still complete their purchase after this change?"&lt;/p&gt;

&lt;p&gt;In staging, yes, run the regression suite. But also run exploratory tests. Have a human actually use the feature the way a customer would. Click around. Try unexpected inputs. Navigate away mid-process and come back. These are the tests that catch the bugs automation misses.&lt;/p&gt;

&lt;h3&gt;
  
  
  Capturing what users actually do
&lt;/h3&gt;

&lt;p&gt;One of the hardest parts of testing user workflows is knowing what those workflows are. Developers and testers don't use the product the same way customers do. They know the shortcuts. They avoid the rough edges. They don't make the mistakes that real users make every day.&lt;/p&gt;

&lt;p&gt;This is why we built &lt;a href="https://betterqa.co/flows" rel="noopener noreferrer"&gt;Flows&lt;/a&gt;, a browser test recorder that captures real user interactions. Instead of guessing which workflows matter, you record them. A tester walks through the actual user journey, Flows captures every click, every navigation, every form input, and turns it into a repeatable test. When someone says "test the checkout flow," you're not testing a developer's idea of the checkout flow. You're testing what users actually do.&lt;/p&gt;

&lt;p&gt;The difference matters. We've had cases where the developer-written test covered 6 steps and the recorded user workflow covered 14 steps, because users do things like checking their cart twice, editing quantities, applying a coupon code, removing the coupon, adding a different one, and then checking out. The 6-step test passed. The 14-step test found a state management bug that corrupted the cart after coupon removal.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tracking what escapes
&lt;/h3&gt;

&lt;p&gt;The other half of this is tracking what gets past your testing. If a bug makes it to production, that's data. Not just "fix it and move on" data, but "why did this escape and how do we prevent the next one" data.&lt;/p&gt;

&lt;p&gt;We built &lt;a href="https://bugboard.co" rel="noopener noreferrer"&gt;BugBoard&lt;/a&gt; partly for this reason. When a production bug gets reported, it goes into BugBoard with full context: what the user was doing, what they expected, what happened instead. But more importantly, we tag escape analysis on it. Did we have a test for this scenario? If yes, why didn't it catch it? If no, should we?&lt;/p&gt;

&lt;p&gt;Over time, this builds a picture of your testing gaps. You stop seeing production bugs as random bad luck and start seeing them as predictable failures in specific categories. Maybe your team consistently misses accessibility regressions. Maybe edge cases in multi-currency handling always slip through. The pattern tells you where to invest your testing effort.&lt;/p&gt;

&lt;h3&gt;
  
  
  What staging should actually be for
&lt;/h3&gt;

&lt;p&gt;I'm not saying staging is useless. Staging is valuable when it's treated as a final validation step, not the first real quality check.&lt;/p&gt;

&lt;p&gt;By the time code reaches staging, you should already be confident that it works. Unit tests passed. Integration tests passed. A human walked through the user workflow at least once. Code review caught the obvious architectural problems.&lt;/p&gt;

&lt;p&gt;Staging should be confirming what you already believe: this is ready to ship. It should be catching the rare environmental issues and last-minute integration problems. It should not be the place where you discover that a core feature is broken. If that happens regularly, your upstream testing has gaps that no amount of staging regression can fill.&lt;/p&gt;

&lt;p&gt;Think of it like proofreading. If you hand a document to a proofreader and they find it's missing three chapters, something went wrong long before the proofreading stage. Proofreading catches typos and formatting issues. It assumes the content is already complete and coherent. Staging should work the same way.&lt;/p&gt;

&lt;h3&gt;
  
  
  A checklist for teams stuck in the staging trap
&lt;/h3&gt;

&lt;p&gt;If your team keeps finding significant bugs in staging or, worse, in production after staging regression passes, here's where to start.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Audit your environment parity.&lt;/strong&gt; List every configuration difference between staging and production. API keys, feature flags, database sizes, third-party endpoints. Any difference is a potential blind spot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Map your user workflows.&lt;/strong&gt; Not your test cases, your actual user workflows. Talk to support. Read the tickets. Watch session recordings if you have them. The gap between what you test and what users do is where production bugs live.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test with production-scale data.&lt;/strong&gt; If your staging database has 500 records and production has 5 million, your staging tests are performance theater. Either scale up your test data or run specific performance checks against production-like volumes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Track your escapes.&lt;/strong&gt; Every production bug should trigger a brief retrospective. Not a blame session, just a question: could we have caught this earlier, and if so, how?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Move testing left.&lt;/strong&gt; Not as a buzzword, as a calendar event. Integration tests before code review. User workflow tests before staging. Staging becomes confirmation, not discovery.&lt;/p&gt;

&lt;p&gt;Your staging regression suite passing is not evidence that your product works. It's evidence that your staging environment is internally consistent. Those are different things, and the difference shows up in production.&lt;/p&gt;




&lt;p&gt;We work with dev teams who keep finding bugs too late in the cycle. More about how we approach testing at &lt;a href="https://betterqa.co/blog" rel="noopener noreferrer"&gt;betterqa.co/blog&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>testing</category>
      <category>automation</category>
      <category>a11y</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
