Let’s move on to performance and reliability issues in automated software security testing. At Cossack Labs, we trigger automated benchmarks after each PR. Though It takes time, this allows us to notice performance degradation faster during the development of current features 🔐
When it comes to security testing, performance testing is hardly an issue that really springs to mind.
However, performance reliability is really the first step towards ensuring a safe and secure functioning of a system.
One of the risks to consider in this context is a denial of service caused by an overload or by (D)DoS-type attacks. Regardless of the possible cause, it is crucial to have an exact estimation of the future calculated load level, and a point beyond which the denial of service happens, at the moment when the system is designed.
Despite the lack of a single yardstick for different systems, it is possible to get useful testing results.
If they are recorded and published along with the characteristics of the testing platform, they will help the system architects. Running performance tests on the target equipment after the installation and configuration of the software will also yield more precise threshold levels. This, in turn, will help to configure the load-limiting and alert systems accordingly.
📎Upon the performance testing you will be up to date with such values as:
- the number of operations performed per unit of time (both overall and segmented into groups: normal mode of operation, invalid data, “light”/”heavy” input data, etc.),
- the number of errors that arise during an execution of an operation (both general and type-based),
- the necessary amount of resources needed for each testing mode.
While choosing a preferred methodology for running such tests, keep focus on your end goals.
📎Depending on them, you can make use of various approaches:
- Testing separate functions and modules (or a group of modules). Run it to find the “bottlenecks” and to do the performance estimation for a particular module. Note, the tested module should work under the conditions with regular resource allocation, while all the other modules are not limited in their resources.
- Complex testing. Opt for it to evaluate the interaction between separate elements of the software tested as a whole. Testing environment in this case may differ from the real target environment.
- Testing with approximation of real conditions. This time round, the testing process is aimed at evaluation of software performance under the anticipated operating conditions. The testing environment should emulate the real target working conditions as much as possible, and eliminate the influence of additional side-components.
When things work “better” than expected, start worrying. At the development stage, the irony is that a sudden spike (both upward or downward) in the app’s performance can serve as an indicator of an error.
A sudden decrease of the performance stats can be caused by errors that lead to performance degradation and overload failures. A sudden increase in performance can indicate changes in the logic of the working code—i.e. an erroneous exclusion of the incoming data validation step.
📎Among a vast number of testing patterns, we find these three types as the most relevant to the subject:
- Stress testing, meaning a step-by-step load increase which helps to identify the performance limits under data and users overload. It opens paths to evaluating the estimated nominal load level and the system recovery after excess job
- Endurance testing. This long-term software performance evaluation enables finding memory leaks and cumulative errors.
- Spike testing. Testing with sudden spikes in the load helps to surface the problems that can arise during the unexpected breakdowns in the normal functioning of balancing systems, routing, and during (D)DoS attacks.
In our experience, classic performance testing methods in combination with other kinds of testing lead to better results. Such sets provide a more complete picture in contrast to only running performance tests in the regular modes of work.
For instance, carrying out stress testing with inputting invalid data allows to estimate the validation mechanism’s work. Endurance testing, carried out with decidedly “heavy” data greenlights a valid estimation of the resource consumption.
Performance testing provides you with answers on how the components of a system behave under certain given situations. This stage is inescapable for prevention of the attack vectors which often exploit non-standard events to cause (D)DoS. The attackers rightly assume that in such cases, most products will experience a striking performance drop and the deep branching of the program logic may be untested for similar situations and, as a result, vulnerable.
Most likely, you do backups for the data inside your system. Data recovery, once it is required, can be a stressful scenario in itself, so one would hardly relish additional pressure of worrying whether backups are valid or not.
The solution, of course, is to test that backups have worked by restoring data. This hard and time-consuming task doesn’t yield obvious immediate returns but they become vital in a challenging situation.
Backup testing should include testing of physical recovery, virtual recovery (using virtual environments), data recovery, and full application recovery. Trust us, backup testing is crucial when you deal with encrypted data and cryptographic keys; make sure you test decrypting restored data.
In a perfect world, every backup should be tested after it’s created, yet it would be more practical to include backup testing into the regular backup cycle or perform it after any significant changes in the application or application data.
Is it possible to forego the backup testing and only concentrate on testing the main system, hoping the backups will just mirror it?
Well, they say nothing’s impossible in this world, but assuming that something works is not the same as testing and knowing it for sure. A story of struggle and loss (of several hours’ worth of backups) would be the time when some untested backups failed GitLab.
Running tests on larger entities has its own issues and pitfalls.
While testing for vulnerabilities in your product or website, automated vulnerability tests are basically trying to wreak as much havoc and do as much damage as possible in the process. Testing process, if taken on a live infrastructure, can result in breaking down your application (i.e. when malicious injections or simulated (D)DoS attacks work too well).
Your email can be flooded, logs overflown, sensitive links crawled and exposed for the whole world to see, and servers down due to an overly, let’s say, efficient work of automated vulnerability scanners on whatever they’ve been set out to battle-test.
Sometimes a diligent approach to automated security testing can make you feel slightly overwhelmed and tempted to restrict the processes that you do not wish to test (or to be carried out on the tested material). Most scanners actually provide settings that allow to choose and limit their "rampage" options.
Still, it is better to see everything messed up and broken, and to fix it knowing that the worst had already happened, without terrible consequences or hostage to fortune, and with your total control and blessing. The sorry alternative is seeing some minuscule and (previously) seemingly irrelevant component that was left out of the check become the entrance into your system for a truly malicious outside attacker.
Now and again, security tests require human interpretation. Such tests are better run on builds, rather than on every commit, but they are challenging for modern CI/CD approaches.