Creating a self-sequential pipeline in Jenkins

#groovy #jenkins #ci #devops

(header image derived from image by Wikipedia user Inductiveload)

So I had the absolute pleasure this fortnight of implementing what we call in my company a "lifecycle pipeline" in Jenkins. Essentially, running a script over and over and over again to flex the mechanics of our physical product, through means of our CI pipeline infrastructure. It produces a lot of log data, and for something that stands to be running for weeks, even months, we wanted to be sure that

We didn't max out the storage on the little Raspberry Pi that was powering the process and
That we had the extracted data being uploaded at intervals

To do this, we used to have a standard Jenkins job that would run, and then check how many runs it hand done, and if not yet at maximum, it would call itself again for a new run, after uploading the data from its own run to S3.

Of course, we had decided it was probably time to define our jobs as code, to ensure we are tracking changes properly. So I implemented this one as a Jenkinsfile pipeline. I had done a little bit of it before, but there's no learning experience quite like doing so by using a tech in anger.

And boy was I angry.

For a while at least, until pieces fell into place.

The one thing that really gets to me is the confusing mess that is navigating declarative vs scripted pipelines, both in Groovy, and both using similar structures -- but sometimes forbidding your from doing something in one mode that you can do in the other. It's madness.

I've made my thoughts known about Jenkins before and was ready to rant heavily at the end of this project. I'm still not super-happy about the result, but having now a better grasp of what I'm dealing with, I can at least navigate the differences a bit more easily -- being tripped up about seemingly arbitrary and unhelpful "features" is the main reason for disliking any given technology in the beginning after all. I am a fervent vim user, even as I was frustreated with it early on; I am now pretty much the git-scm expert in my part of the company; and even though I was utterly confused by Linux and its terminals start with, I'm now a shill for command-driven experiences.

After a while, a given technology makes sense: you see the method in the madness. You do have to ride that handcart through hell first though.

After persevering a few days, I finally got three scripts done which solve the problem I was having. Here now are the fruits of my labour.

The Dispatcher

The first script is the dispatcher - a simple pipeline that takes a label, gets all the nodes attached to it, and calls the executor job on each of them. Arguably, this is a component that is not needed if the lifecycle is only ever run on a single test node, as its sole responsibility is to throw the actual lifecycle at the various nodes and let them party on their own.

The function declared before the pipeline gets a list of nodes that have the label lifecycle_runners and then produces a structure that has a build-call inside of it, passing required parameters down. a map of these structures is then passed to the Jenkinsfile parallel command in one go.

The original implementation wrapped a node(...) object inside of it because the person whose work I was piggybacking on thought it was necessary. The result was a mix of declarative and scripted pipeline code, which was arguably the key source of frustration.

def prepare_all_nodes_for_test() {
    def nodelist = nodesByLabel(label: "lifecycle_runners")
    def all_node_tests = [:]

    for (agent_name in nodelist) {
        print "Preparing task for " + agent_name

        all_node_tests[ "${agent_name}" ] = {
            build job: 'Job-executor',
            parameters: [
                string(name: 'TEST_AGENT_LABEL', value: "${agent_name}")
                string(name: 'MAX_LOOPS', value: "${params.MAX_LOOPS}")
            ]
        }
    }

    return all_node_tests
}


pipeline {
    agent none

    parameters {
        // Default to an invalid default value to prevent misconfigured runs on target agents
        string(name: 'TEST_AGENT_LABEL', defaultValue: '(label name)', description: 'Run pipeline against a node with this name/label')
        string(name: 'MAX_LOOPS', defaultValue: '100', description: 'Max number of loops to run')
    }

    stages {
        stage('Kick-off lifecycle runs on agents') {
            steps {
                script {
                    def prepared_node_tests = prepare_all_nodes_for_test()
                    parallel prepared_node_tests
                }
            }
        }
    }
}

The Executor

This is the pipeline in which we do the actual work, and implement the "self-calling" nature of it. It can be called on a single node directly, without the assistance of the dispatcher.

It looks like there's quit a bit going on here - but most of it is just demo boiler plate to show off some sections that are useful. I added some comments rather than try to write out-of-context.

The one item I would call out is the fact that the script maintains a lifecycle.properties file as a method to feeding back to the pipeline itself by way of loading said file into pipeline context, so that we can discover what ther script say happened - namely, we get out of it the LOOPS_FINISHED

This is done in the test suite script by writing into a lifecycle.properties file, any number of KEY=value pairs, similar to how it would be written for say a shell environment variable. The KEY must be ASCII alphanumeric, no other characters than underscores, followed by an equals (=) sign and a value, with no surrounding whitespace. When the properties file is read by the Jenkinsfile readProperties command, these are loaded as a map object and can thereafter be referenced.

/* This can only be loaded during one stage, but is needed in a subsequent stage
   Alas, global variable
*/
def lifecycle_props = [:]

/* We need a declarative pipeline to set the parameterization block as code
*/
pipeline {
    agent none

    parameters {
        // Default to an invalid default value to prevent misconfigured runs on target agents
        string(name: 'TEST_AGENT_LABEL', defaultValue: '(agent name)', description: 'Run pipeline against a node with this name/label')
        string(name: 'LOOPS_FINISHED', defaultValue: '0', description: 'Number of loops run so far')
        string(name: 'MAX_LOOPS', defaultValue: '100', description: 'Max number of loops to run')
    }

    stages {
        stage("Running Lifecycle script") {
            agent { label "${params.TEST_AGENT_LABEL}" }
            steps {
                script {
                    git branch: "main", changelog: false, credentialsId: 'gitlab-auto', poll: false, url: 'https://gitlab.com/owner/repo'

                    /* We set the PYTHONPATH environment at run-time as
                    it is not picked up properly when run in-advance
                    with an "environment { }"
                    */
                    // my_test_file.py should get LOOPS_FINISHED from "lifecycle.properties" file in the workspace, and continues to increment it
                    sh 'PYTHONPATH=${WORKSPACE} testSuite directory/structure/my_test_file.py > my_results.xml'
                }
            }
            post {
                always { // Even on fail, perform the upload steps etc
                    script {
                        // file created by my_test_file.py - a "KEY=value" text file store
                        lifecycle_props = readProperties file: 'lifecycle.properties'
                        addShortText text: "${NODE_NAME}: Loops executed: ${lifecycle_props.LOOPS_FINISHED}"

                        println "Archiving results for: ${NODE_NAME}"

                        /* Archiving must be on a path relative to the workspace, and in the workspace
                        An absolute path, even inside the workspace, will fail
                        */
                        archiveArtifacts artifacts: "my_results.xml", followSymlinks: false

                        // Upload to AWS/S3 bucket to collect all files into a single location
                        withAWS(region:'mordor-south-66', credentials:'s3-credentials') {
                            s3Upload bucket: 'lifecycle-logs',
                                file: "my_results.xml",
                                path: "lifecycle-logs/ticketID-lifecycle/my_results.xml"
                        }
                    } // script
                } // always

                success {
                    script {

                        if(Integer.parseInt("${lifecycle_props.LOOPS_FINISHED}") < Integer.parseInt("${params.MAX_LOOPS}") ) {
                            println "Finished loops ${lifecycle_props.LOOPS_FINISHED} has not reached Max Loops ${params.MAX_LOOPS}. Triggering new build."
                            // "Job-executor" is the name of the job that uses this pipeline file
                            build job: 'Job-executor',
                                parameters: [
                                    string(name: 'TEST_AGENT_LABEL', value: "${NODE_NAME}"),
                                    string(name: 'MAX_LOOPS', value: "${params.MAX_LOOPS}"),
                                    string(name: 'LOOPS_FINISHED', value: "${lifecycle_props.LOOPS_FINISHED}"),
                                ],
                                // Do not wait, otherwise the child job will be waiting on its parent to finish
                                wait: false
                        } else {
                            println "Finished loops ${lifecycle_props.LOOPS_FINISHED} has reached Max Loops ${params.MAX_LOOPS}. Build sequence finished."
                        }
                    } // script
                } // success
            } // post
        } // stage
    } // stages
} // pipeline

The Takeaways

Having been through this now, I am pretty confident I've got the fundamentals down - more or less. A couple things I take away from this experience:

Don't mix and match scripted pipelines and declarative pipelines. It's not worth it. You may one day develop a karmic understanding of what rules apply in whcih contexts. Your colleagues who come after you will lose hair, sleep, and possibly the will to live.
If you use a parameters { } block, do NOT ever think you can do a quick re-configuration of the defaults via the Web UI. The next run of the pipeline will blat those changes and leave you wondering. I could expound on that, but it would take a while. Set your defaults in-script. Change the defaults via code change. End-of.
The internet is replete with examples that don't really specify whether they're for scripted or declarative. The key way to figure out which you are in is whether you started your block with pipeline { } (declarative) or node { } (scripted). Most documentation is for declarative. The latter stands to be simpler to maintain - if your pipeline design forces you to use scripted pipelines, you may want to rethink the design before you commit to the second style - the temptation to implement N-levels of complexity from scritped pipelines can be very, very strong, and future-you (and Co.) will not thank you.

DEV Community

Creating a self-sequential pipeline in Jenkins

The Dispatcher

The Executor

The Takeaways

Top comments (0)