Almost every developer has at some point in their career encountered (or built) a piece of code that’s so intimidating, no one wants to touch it. Changes to the code may be easy, but the code has so many edge cases that breaking something is easy.
And of course, a bug will eventually come in that is caused by that code. No one wants to draw the short straw there. The fix may be easy, but the testing would take hours. And that assumes that all the edge cases are documented or known. (ha!)
Automated testing won’t make this piece of code pretty, but it can certainly make working with it easier. A test plan that takes several hours to go through manually can be run in mere seconds as an automated test suite. That not only saves time, but gives you confidence when making changes to some sensitive code. With automated integration tests, you may even have a much easier time rewriting the thing if you so choose.
There are a lot of topics under the umbrella of automated testing. In this post we’ll cover the following at a high level:
- Unit Testing
- Integration Testing
- Selenium Testing
- Limitations to Automated Testing
Unit testing is the most basic form of automated testing. It is what most people talk about when they talk about writing tests. The advantages of unit testing is that they are simple to understand, simple to write, and execute consistently.
Let’s say you have a function to calculate inches to centimetres: convert_inches_to_cm(inches_value)
One unit test would be to convert a simple case where 2 inches should be 5.08 cm:
result = convert_inches_to_cm(2)
assert result == 5.08
One unit test is rarely enough so we add some test cases for “edge” cases.
result = convert_inches_to_cm(0)
assert result == 0
result = convert_inches_to_cm(2)
assert result == -5.08
Expected errors should also be tested to make sure our application doesn’t crash with bad input.
result = convert_inches_to_cm(“yolo!”)
assert result == None
Unit tests are also great for more complicated functions. For example, Dynomantle scrapes webpages to provide link previews. Descriptions on a web page can be provided in a variety of ways. Here is what a test extracting the description from a Twitter meta tag looks like:
html = ”“”<Some really long html from a sample site>”“”
parser_obj = html_parser.HtmlParser(html_content=html)
descr = parser_obj.extract_description()
assert descr == ‘An awesome blog on software development’
There are numerous other tests as well where the test code is exactly the same, but the html has been changed to account for the different case.
Now unit tests are excellent for functions that have all the logic they need contained inside they. These functions don’t interact with other parts of your code base. Most importantly, they don’t interact with ANY other systems like a database.
Take Dynomantle as an example. Say there was a unit test for adding a bookmark. To validate that the bookmark was added correctly, we would have to retrieve the bookmarks for a note. The first time we run this test, we end up with one bookmark in the database. The problem is that the second time we run the test, we’ll have two bookmarks. The third time will result in three bookmarks. The database is changing with every execution of the test which jeopardizes the consistency of the tests.
This problem can be avoided by using something called “mocks”. We won’t go too deep into mocks because note that we are avoiding the problem and not solving the problem with mocks. The goal of unit tests is to test bite size pieces of our application, but we still need to test our application as a whole. How the different pieces interact matters a great deal.
The following gif perfectly illustrates the limitations of unit testing:
The remote sensor triggers correctly with a human hand. The faucet correctly delivers water. The sink bowl would probably hold the water properly…. if only the faucet and the bowl interacted well.
This brings us to the topic of integration testing. This is where we test larger parts of our system. If you have an API endpoint that calls 5 functions, the unit tests would cover each of those 5 functions, but the integration test would run the API endpoint as a whole.
Using Dynomantle as an example, there is an endpoint to add a URL as a bookmark. The unit tests cover each individual function used to parse the HTML content and retrieve data necessary for link previews. The integration tests call the endpoint itself and validates that the link preview as a whole appears correctly.
It is a more representative test of how users are using the application. However, integration tests are often neglected compared to unit tests because they are not simple to write, not simple to execute, and are a struggle to get consistency with.
We still have the problem where every time you add a bookmark, the data in the database is changing. Unit tests are talked about more often because it is so difficult to get consistent executions with integration tests. To ensure consistency, the integration tests for Dynomantle involve:
- Setting a database with a fresh schema
- Creating new search indices
- Creating test users
- Running the tests
- Deleting the database
- Deleting search indices
- Clearing Redis
That list only grows as new systems are added or built. Dynomantle has it easy too. All data for a user is isolated to that user. That means each test can have its own user which makes running tests concurrently easy. A product that relies on aggregating a group of users data, or even all user data, will have a lot more complexity in the setup and execution.
The complexity with integration testing makes these form of tests potentially fragile. Few things frustrate a developer more than having a test unrelated to their change fail. Faulty tests create a gut reaction that the tests aren’t worth maintaining, even in people who strongly believe in tests.
An example of this is logic that uses timestamps. It is really easy to just create a timestamp exactly where you need it in the code. The problem here is: are your timestamps down to the millisecond or to the second? Integration tests execute faster than human users interact with your app. If your timestamps only go down to the second, then you can have multiple timestamps created in the same second. Anything you have that is sorted/ordered by timestamp will now be non-deterministic. Any test on sorting order will now fail 50% of the time.
It sounds like storing timestamps in milliseconds is an easy fix. Depending on your application though, that could have big storage or performance implications at scale. Do you make this decision solely for tests to run when it has no effect on your users?
You could also strategically decide where timestamps are created. It would look something like this:
my_time = time.time()
my_time = time.time()
This allows you to at least test func1 with whatever timestamp you want in the test. More edge cases around sorting order can be accounted for in the func1 tests while the api_endpoint tests can be a bit more spare.
However, the code is a bit more cumbersome now because you have to create the timestamp in a place that does not need the timestamp and then pass it down. It doesn’t seem so bad in this example, but it can get really irritating in any real application.
There are lots of solutions around this problem, but every solution is going to have some trade off. And you still have to deal with the issue of thinking of these problems ahead of time. Timestamps are easy to think of because they’re so commonly used. Every application is going to have its own set of unique problems to deal with in integration tests though.
Reliable integration tests take time to build because you have to commit to solving the bugs in the tests themselves.
Despite the challenge, integration tests are a critical part of any automated testing strategy. To have the confidence to do a release without a full manual regression test, the automated tests need to run through scenarios as if they were real users taking action. That can’t be simulated with just unit tests.
Selenium tests are the highest fidelity automated tests you can have for web applications. A real browser is launched by the Selenium driver, the application is rendered as it would in your user’s browser, and the test would click/drag/type in the application the same way your users would. It also doesn’t matter if you have a single page application or not. Selenium treats your application as a black box and doesn’t need to know anything other than the ids and class names of your html elements.
The nice thing is you can also watch the test run. Instead of reproducing a failed test case by looking at code, you can watch Selenium click through your application until it hits a failure case. You can even open up the browser console and start debugging when the test fails.
You may have already guessed the biggest downside of selenium tests though: execution speed. Selenium can click faster than a human user, but it is still waiting for pages to load and API calls to complete. There’s an option for a “headless” mode which removes a visual browser from having to render your application, but it still takes a while for a test to execute. A series of integration tests that take 2 minutes to execute could take hours with Selenium. Even if you’re not doing the testing manually, that is a long time to wait.
Treating your application as a black box also means Selenium is not appropriate for testing certain technical aspects of your application such as whether data is being loaded from a cache or a database.
Selenium testing is most likely best for testing the “happy path” in your application while leaving the majority of edge cases to be tested by integration tests.
Limitations of Automated Testing
There are a number of things that will involve humans testing your software. The first of which is visual design. You can write an integration test or a Selenium test to make sure a button is clickable. It is a lot hard to test that the button is large enough to be visible, isn’t partially covered by another visual component, has the proper padding, or is even the right color. One possibility is to use screenshots and validate that the result of a test execution has a screen that looks the same as the screenshot. However, these tests can be very fragile, especially if you are iterating on your UI frequently.
Side note: if you know of a reliable way to overcome this limitation, feel free to send a message or tweet to @Dynomantle
Another limitation to automated testing is anything that involves a third service. Relying just on AWS or Google Cloud is one thing. Those services are built for scale and for you to do automated testing. If you have a product that deals with email however, an automated test looks very much like spam. If you have a product that scrapes web pages, an automated test could look like a denial of service attack. If you have a product that relies another service’s API, an automated test can hit any API limits very quickly.
Lastly, while automated tests can attempt to simulate user actions, the tests are only capable of doing what you tell them to do. Users are notorious in their ability to think of unintended uses of software. For developing a product, this is a blessing. For testing a product, this is a challenge. You will still get bugs and you will have to keep your tests up to date as you fix those bugs.
Hopefully this article was useful in helping you start with your own automated testing strategy. If you would like to see a detailed explanation of how to create tests in ReactJS/Typescript and Python, I put one up on Dynomantle. Sign up is free!