Testing is the de facto standard in software development. Just take a look at different popular projects on GitHub, and you may see that they all have tests.
Or search books about testing, and we can find many of them.
In most of the job interviews, people always ask me about whether I test my code. There is no doubt that good developers must be very skilled in testing code.
Testing is hard. Because writing correct code is hard, and writing code that is easy to test is even harder. That's why even most developers I've talked to say that testing is essential, most of them do not test enough!
Junior developers made another extreme mistake when they learned about testing, and they want to prove their ability to test their code. As a result, they added tests everywhere. The real problem is that they test too much unimportant code and let the most valuable, sensible, and sophisticated code untested. Let's take a look at the example below.
def main(): args = parse_args() result = compute(args) return result
I see a senior developer added ten tests for the function
parse_args and zero for the function
compute. If he's not a junior developer, then he's must not be an honest person. He wants to show that he has tests, even it just tests 5% of the logic.
In one company, I've even seen developers test code during the demo! Let's say it is about a Spark streaming application that takes some messages from Kafka and send it to a database. During the demo, the developer said: "Hey, you see, I've just created a message and send it to the Kafka broker, and you can see that in the logs and now when I go to the database, I can see the value persisted". This developer did not write any tests for this feature. The feature is only valid during the demo, and any change after that can make it broken.
When we talk about legacy code, we think mostly about projects over ten years old, with developers come and go. It's hard to change anything because there are not enough tests, and it works in production.
But I've seen a project that is not yet used in production and contains lots of legacy code with no tests.
How can we refactor legacy code without breaking anything, when there are almost no tests in place? Then the integration test is the solution.
The integration test is different than the unit test by these aspects:
- Unit test is for a function. The integration test is for the whole application
- Unit test uses mock to fake the dependencies (database connection, web request, etc.), the integration test uses real components.
- Unit test is very cheap and quick. The integration test is expensive and slow
Unit tests and integration tests are complimentary.
In the integration test, the hard work is to create a script that does these things:
- Preparing of the input data.
- Preparing of the external component like database or message broker (must be in the clean state).
- Preparing of the configuration, usually different from the one in production.
- Launching of the jobs.
- A system that waits for the job to complete then show any errors occurred.
- Process that cleans the data generated by the test, if it's successful.
In a microservices architecture, there are many applications working together in the data pipeline. The integration test should launch one after another and verify all the intermediate results.
The most challenging feature of the integration test is the independence of the executions. Developers and build machines should be able to run tests at the same time. Therefore, the folder that contains all files needed during the test like binary executor file, configuration, and data should have a random name. In my project, I chose UUID to generate a random name for each execution.
Until now, the tests I've described above only prove that these applications finished successfully. But what about the values it produces?
We need a script that checks whether these values are corrected. For example, the length of input and output must be the same. But as I mentioned at the beginning of this article, we often deal with projects with not enough tests, and we have to start to write some tests before refactoring. Most of the time, the product owner does not know all the specs, and the original developers of the project are not available. So how do we check the output values of the tests?
It turns out that it is not hard at all. We can take the idea from the regression tests that I described in an old post. The idea is simple: we run the tests for the first time, and copy all the output values into another folder. Then we write the script that compares the result of the new tests to the reference values. These tests make sure that we don't change any values during refactoring. When we have new features that modify the values, then we can safely copy these new values into the references. Each time we do that, we have to make sure the PO is ok with these new values.
Most developers agree that tests are essential. But in many projects with even experienced developers, I don't see enough tests. If there are not enough tests, the new developers are afraid of making a change to the code base. The solution is to create global integration tests that run all the applications one after another, then compare the output values to some reference values that we saved earlier. The challenge of these tests is the ability to proceed independently and concurrently.
I hope this article helps you to start writing more tests in your project.