DEV Community

Cover image for Why Gherkin (Cucumber, SpecFlow,…) Always Failed with UI Test Automation?
Zhimin Zhan
Zhimin Zhan

Posted on • Updated on

Why Gherkin (Cucumber, SpecFlow,…) Always Failed with UI Test Automation?

This article was originally posted on Medium (2021-01), and featured in Medium's largest publication 'The Startup' and some software testing newsletters. This is also included in “Be aware of Fake Test Automation/DevOps Engineers” series.

Many software projects tried or are trying to use Cucumber for test automation, commonly with Selenium Driver for testing web apps. Some might wonder whether my title is just a personal and radical view for attention. No, I just reworded the view from Aslak Hellesøy, the creator of Cucumber:

“So really, what is Cucumber? As a test tool it sucks. There far better automated test tools” (source)

Some Gherkin fans may say: “this might be one mis-comment”. Oh well, here is another one on Aslak’s home age.

“If all you need is a testing tool for driving a mouse and a keyboard, don’t use Cucumber. There are other tools that are designed to do this with far less abstraction and typing overhead than Cucumber.” (source):

With my 10+ years of test automation and continuous testing consultation, every test automation attempt with Gherkin-style syntax (Cucumber, SpecFlow, JBehave, Concordion, Gauge, Spinach…) failed, with no expectations! The biggest failure I heard, “The project spent 3 times of development efforts (measured time and money) trying to maintain those cucumber tests, eventually, dumped to the bin!

Why does Cucumber fail in test automation? Let’s hear from DHH (Creator of Ruby on Rails, Founder & CTO at Basecamp & HEY, NYT best-selling author):

DHH’s tweet in March 2011

DHH is correct, as usual. I have never met one business customer who actually read (or even run) cucumber tests.

Image description

Once we established that, it is clear that Cucumber tests have little value for customers (and, therefore, business analysts). From the technical perspective, the effort to support the extra layer of test-specific parser for English is going to cost the team a lot. (please note, again, there are no or little business values of doing that)

Some might still argue: “I disagree. The creator of Cucumber and DHH were wrong, I implemented Cucumber successfully on the last project”. For every person who said this to me, and when I had the chance to assess their work, just all plain lies. Think about it, “the last project”, how about this project? Show us. For web test automation, the knowledge is pretty much fully transferable. For every new project, I created at least one working core test (given the environment is ready) and mostly ran in a Continuous Testing server (e.g. BuildWise), on the first day yes, you read right, on Day 1.

What is the main technical reason for Cucumber (i.e. Gherkin) failing on test automation? Test Script Maintenance. The first few test cases are easy to do and relatively simple, and static (e.g. login, sign up). Some get excited, executable specifications that like English, woo-hoo! With more tests, as we know, test cases may get more complicated, and the maintenance effort (existing and new) will grow quickly (exponentially if lacking the tools and capability to meet the challenges). That’s the case for all UI test automation, even with good syntax frameworks such as RSpec.

If someone thinks maintaining automated UI tests is easy, please read this interview from Microsoft Test Guru Alan Page: “95% of the time, 95% of test engineers will write bad GUI automation just because it’s a very difficult thing to do correctly”. If you think GUI automation is easy, then you are at 1 out of 400 SET (software engineers in Test) level, according to Alan Page. Please note, this is judged by the Microsoft Engineer standard. How many test engineers in your city (or state) can meet the standard? Not many, right? Only 1/400 of them can do real GUI test automation well, and they probably won’t think it is easy. This is not just Microsoft’s view, Google VP Patrick Copeland said this: “In my experience, great developers do not always make great testers, but great testers (who also have strong design skills) can make great developers. It’s a mindset and a passion. … They are gold”.

Some might feel intimidated, “if test automation is that hard, we probably shall give up, just do manual testing”. Please don’t, without a solid E2E test automation, there is no real Agile. Test Automation (and later Continuous Testing) is challenging, but not impossible. Face the challenge, and take proper actions such as seeking help from a qualified test automation coach (self-learning is possible, but will be more difficult), choosing correct frameworks (such as raw Selenium WebDriver + RSpec), highly-effective testing tool such as TestWise and dedicated Continuous Testing server such as BuildWise. (By the way, you can start with all the above frameworks/tools, FREE). It can be a very rewarding journey, with positive results in a short time.

However, it is very easy to make bad decisions along the way, choosing Gherkin syntax is one of them.

Cucumber tests will require more maintenance efforts, with that extra useless (except for demos) layer, a lot more. That’s why, from my knowledge over 10 years, every Gherkin automation failed.

A recent big failure was at a large finance organization (claiming Agile for over 12+ years) in my city. Its Gherkin solution (in Java) was so bad that about 3 times of all development effort (time and money) were spent on trying to maintain those ‘Gherkin BDD tests’. Of course, eventually, the tests were abandoned. The excuse the management used was “the contractors worked on this left, so failed to maintain”. Of course, this was not the truth. The root problem was a bunch of mediocre programmers, who mistakenly over-estimated their knowledge of test automation, and made a bad choice based on a naive idea of executing “Given-When-Then” user stories that Business Analysts wrote. Sadly, these kinds of mistakes are keeping repeating.

Besides the human factors, what are the technical reasons why DHH and the creator of Cucumber are against BDD with Gherkin tests? Below is a comparison of the test tiers (based on Maintainable Automated Test Design) between a good test syntax framework RSpec and the bad Gherkin (when used for test automation).

test tiers

The extra effort (right graph) comes from the ‘test-specific English parser’, the part DHH was referring to. Let’s look at an example Cucumber test.

1. Test (Gherkin) Layer

Feature: Select Flights 
  As a registered user
  I can select flights

  Scenario: Oneway Trip
    Given I am signed in as "agileway"
    When select oneway trip
    And depart from "Sydney" to "New York" on "07" of "May 2021"
    And click "Continue"
    Then I should see "2021-05-07", "New York" and "Sydney"
Enter fullscreen mode Exit fullscreen mode

2. Step Definitions Layer!

Given /^I am signed in as "(.*?)"$/ do |user|
  sign_in("agileway", "testwise")
  @flight_page =
When /^select oneway trip$/ do
When /^click "(.*?)"$/ do |arg1|
  sleep 0.5
When /^depart from "(.*?)" to "(.*?)" on "(.*?)" of "(.*?)"$/ do |from, to, day, month_year|
Then /^I should see "(.*?)", "(.*?)" and "(.*?)"$/ do |t1, t2, t3|
  the_page_source = driver.page_source
  expect(the_page_source).to include(t1)
  expect(the_page_source).to include(t2)
  expect(the_page_source).to include(t3)
Enter fullscreen mode Exit fullscreen mode

3. Helper and Pages Tier

Helper: support/step_helper.rb, included in support/env.rb

def sign_in(user, pass)
  driver.find_element(:id, "username").send_keys(user)
  driver.find_element(:id, "password").send_keys(pass)
  driver.find_element(:xpath, "//input[@value=\"Sign in\"]").click end
Enter fullscreen mode Exit fullscreen mode

Page class: pages/flight_page.rb

require File.join(File.dirname(__FILE__), "abstract_page.rb")
class FlightPage < AbstractPage
  def initialize(driver)
    super(driver, "")
  def select_trip_type(trip_type)
    driver.find_element(:xpath, "//input[@name='tripType' and @value='#{trip_type}']").click
  # more functions ... 
  def click_continue
Enter fullscreen mode Exit fullscreen mode

Please note the helper and page classes, if designed well, can be 100% reusable, regardless of what the top syntax framework you use, Cucumber, Capybara or RSpec.

As a comparison, below is the test (top) layer for RSpec.

 before(:all) do
   sign_in("agileway", "testwise") 
 it "One-way trip" do
   flight_page =
   flight_page.select_arrive_at("New York")      
   flight_page.select_depart_month("May 2021")
   expect(page_text).to include("2021-05-07 Sydney to New York")
Enter fullscreen mode Exit fullscreen mode

There is no middle tier (helper/page class tier is the same), therefore, the test script is much easier to maintain. You can get the above test scripts from Github. For a more comprehensive example, see this article: WhenWise Regression Test Suite Reaches 500 Selenium tests and ~300K Test Executions.

RSpec is the most popular “Behaviour Driven Development for Ruby”. RSpec v3.8.0 alone has over 193 million downloads on RubyGems. While RSpec may also be used for unit or integration tests, its download count is quite impressive. As a comparison, the most-downloaded Cucumber v3.1.2 is merely 8.8 million.

Cucumber is not the first failed test framework that uses English-like syntax for automated testing (it may be for other uses, but definitely not real test automation). Do you still remember FitNesse (it was quite big about 10 years ago, an example here)? Now it is hardly mentioned.

Some frustrating Gherkin ‘test engineers’ might grudge: “Maybe you just don’t understand Cucumber”. Sorry, I do know Cucumber well.

  • Cucumber was developed in Ruby; I am a winner of the 10th Ruby Award. I also worked for many years as a senior software engineer (contractor) using Java, C# and JavaScript.

  • TestWise, a next-gen functional testing IDE I created, supports Cucumber. (Kent Beck, the father of Agile, once said: ‘I hated the idea so I had to try it.’)

  • BuildWise, an international award-winning Continuous Testing server I created, supports executing Cucumber tests too.
    It shall be fair to say that my Cucumber/Gherkin knowledge is better than most ‘cucumber automation engineers’.

Real functional test automation is far more than a fancy demo. If you truly believe Gherkin automation tests are the way to go, please do it well, don’t ruin the reputation of test automation. Make test automation visible and relevant to the team daily, enabling the team to release to production multiple times a day. That’s what I can do with raw Selenium WebDriver tests in RSpec for my web apps: ClinicWise, SiteWise, and WhenWise.

If you are an architect/manager of an organization doing Cucumber Test automation (usually fooled by fake agile coaches), I suggest you write a concerned email but not too direct and save it (that’s important) or even start preparing your scapegoat. When the shit hits the fan, you can say: “I told you so” or “that fake agile coach’s fault’.

Background image credit: Futurama S2E4


1. Do you mean BDD is just hype?

No, Check out BDD Clarified: BDD ≠ “Given-When-Then” (Gherkin).

2. My team is trying to introduce Gherkin, what can I do to stop that?

Check out A Practical Advice on Rejecting Gherkin for Test Automation.

Related articles:

Top comments (0)