TLDR; Our company dockerized pretty much everything, as such to tackle long waiting time of the CI pipeline, we split the big list of behat files into chunks and process it using docker in parallel using python multiprocessing.
Background:
In my workplace, we deploy on a daily basis. Each time we want to deploy, we have to wait for at least 20 mins for the pipeline to finish. On average it takes around 25~30 mins from the time we push the code to our version control, wait for it to finish running all the checks, then deploy to production.
The main reason for this delay is we have a lot of behat .feature
files. We emphasize a lot on integration testing, as such the test files keep growing.
All these waiting is frustrating whenever we want something in production very fast. It's even more frustrating when we're working with 3rd parties and they have to wait for us to apply changes.
The search for cutting down time
We've search for various ways to reduce the time, some of the suggestion was:
- Remove deprecated features and its
.feature
file - Refactor the
.feature
file to remove redundancy - Search for 3rd party packages that deal with parallelism
Point 1: we've been regularly doing it, but the time it reduce was not as much as we expected.
Point 2: no one really wants to take the time to find redundant feature files and refactor it.
Point 3: 3rd party packages are unable to to satisfy all our requirements.
After all things considered, we've decided to create our own parallel behat runner.
The journey to achieve parallelism
Step 1: How do we even run process in parallel?
TLDR; I don't even know myself the intricate logic behind parallel processing. What I do know is that I've been using python multiprocessing
package and it worked wonders!
Step 2: Add a script that takes specific input to execute
Python multiprocessing
needs a process to run. So I've created a simple bash script that takes in .feature
folder path and execute all the files within it. It would basically be self-contained docker process that executes the test and die.
Step 3: Chunk it!
This part is easy, let say we have .feature
folders A,B,C,D,E,F. By putting it into chunks we would have [[A,B,C], [D,E,F]]
Step 4: Putting it all together
Finally, we use python multiprocessing
to run the script
which takes in the chunk
. By splitting into 2 sub process, we managed to get the time down from 20 mins to 14 mins. We're still in testing phase on increasing the no. of sub processes. The hypothesis is, more sub process, less execution time provided we have enough CPU core.
The code
For company privacy reason, I will not be sharing the exact code. This is the simplified obfuscated version, much of the things may not make sense to you or edge cases not handled but it's already handled in our actual version.
sub_process_test.py
def main():
processes = []
sub_process_count = 2
chunks = get_chunks(sub_process_count)
for index, chunk in enumerate(chunks):
current_proccess = multiprocessing.Process(
target=run_behat_for_certain_folders,
args=[index+1, ','.join(chunk)]
)
current_proccess.start()
processes.append(current_proccess)
[x.join() for x in processes]
def run_behat_for_certain_folders(pid, folder_names):
if folder_names:
subprocess.call(f"./sub_process_runner.sh {folder_names} {pid}", shell=True)
sub_process_runner.sh
#!/bin/bash
run_sub_process() {
docker-compose -f $docker_compose_path -p "$prefix" exec -T test ./vendor/bin/behat \
--stop-on-failure \
./your-behat-folder/$1)
# status check here, obfuscated
}
# docker initialization here, obfuscated
for i in ${1//,/ }
do
run_sub_process $i $2
done
sub_process_test.py
will call sub_process_runner.sh A,B,C 1
where the 1st arg is the list of folder it needs to run and the 2nd arg is the sub process id.
Conclusion
We achieve our goal of reducing the CI pipeline waiting time. Hopefully this post bring some insights to the reader, whatever the insight may be.
Top comments (0)