keinzeit

Posted on Nov 30, 2022

trigger an action based on a push

#devops

general information scoping

we're using gitlab, and i found this help doc to start with:
https://docs.gitlab.com/ee/ci/triggers/

okay, so off-the-bat concepts i'm seeing are pipeline, trigger token, and api call? sounds like we need to have a pipeline set up that only triggers when someone/something authorized to trigger it is created. so where do we set up a pipeline?

hmm, there's a specific article called merge request pipelines that may be more specific to the task i wanna accomplish. seems like there's two kinds discussed here:

branch pipelines
merge request pipelines

branch pipelines run when you push a commit to a branch, and merge request pipelines run when you create a merge request for a branch. branch pipelines seem to be the simpler of the two, and definitely something i want to explore working with, so why don't we try setting up one on a dummy branch for now?

Creating a pipeline:

me mucking around the dashboard of my gitlab repository.
Going down CI/CD > Editor > Configure Pipeline on the side panel, Gitlab sets up a .gitlab-ci.yml file to be added to the base of the repo in that branch. seems like this .yml is how gitlab knows how to run stuff

Reading the .gitlab-ci.yml template, seems like you can define different stages in the pipeline and different jobs to run in each stage. stages are executed sequentially, and jobs in a stage are executed in parallel.

so now that i have a .gitlab-ci.yml file, i should be able to push a new commit to the remote, and gitlab will run the pipeline specified in the .yml when the remote receive the push.

WELL ACTUALLY, upon committing the .yml to the branch, gitlab already wants to run the pipeline. but it's stuck rn because we have no active runners! so how do we get an active runner?

Setting up a runner

Gitlab says "Go to project CI settings"
And then I see this:

Ahh, things are making sense now. I can either set up my own runner, or I use a runner that's shared across the Gitlab instance (presumably managed by the maintainer of the instance, aka my organization). If we have shared runners, that would be the simplest thing to use is my guess. I do not think we have anything sensitive to protect either (i should check in with my boss on this to be sure though).

so now to check for a shared runner on my organization's gitlab instance

welp, guess that idea is out the window.

so there is also the concept of a group runner, but this project doesn't belong to a group. i'm guessing this is referring to how the repo is basically just my private repository (lmao) on the instance instead of being assigned to a group. so obviously no group runners available.

seems like setting up a specific runner is the only way remaining.

Setting up a specific runner

following the instructions to set up a runner, seems like i need to decide on a location on which to install the runner. i wonder if it's okay to start with my machine for now (and perhaps a docker instance on my machine!). another idea would be a virtual server running on my organization's resources.

the runner itself kickstarts the pipeline, but it doesn't actually have to be the entity that's computing the pipeline. at one point in development, fitting a model took 52GB of RAM. we had to use the organization compute cluster to run the damn thing. in this instance, the runner would kickstart the model fit to run on the compute cluster.

so you would have three entities: gitlab, runner, and compute cluster, which would be related below:

An order of operations courtesy of my colleague:

Some specified trigger in the code repo in GitLab triggers a pipeline on the runner
A job with the current codebase retrieves the input data it's supposed to run with
Computation is either done on the runner directly, or indirectly via the compute cluster
Results are obtained and are saved as artifacts of the pipeline
After experimentation is complete, the model outputs with the best results are put into version control

However, we're in the business of starting off by keeping things simple. I should do a first-pass by installing Gitlab runner on my machine and trying to have it kickstart the default (useless) pipeline that Gitlab provides.

installing a runner on my local machine

referring to these two guides
https://docs.gitlab.com/runner/install/
https://docs.gitlab.com/runner/install/windows.html

okay, in the middle of the guide in the second link, i'm told to register a runner via this guide, but at the top of that guide it says that registering a runner is deprecated and that we should use a token architecture. hoo-boy. i'll check out the token architecture ~~to see if i can follow that guide and then proceed with the runner installation guide~~ (nevermind, seems like it's not out yet, and we'll need to stick to the current way of doing it)

so i need a registration token (which i have, opening Setting > CI/CD in my project repo created one for me automatically). now I need to figure out whether to follow the Linux or Windows instructions. I'll try the Windows instructions first.

Ah yes, it should be the Windows instructions. It tells me to run the command .\gitlab-runner.exe register, which tells me i should be executing the .exe i downloaded earlier (it was from one of the two links above).

okay so i ran the .exe above, and basically i hand the utility the instance URL and the registration token, and that's it. the most confusing thing though, is what executor to enter. Gitlab docs says to enter docker in most cases, which makes a lot of sense, but if i weren't using docker, what would i use? gonna check out the executors doc.

seems like it's either shell or docker. they seem to be really suggestion docker (understandably so), so let's just try docker. it's probably pretty easy to set up anyway.

hmm, the last step of the .exe needs me to name the default docker image. seems like it's [language]:[version]? i entered python:3 cause we run python, and there's an official python docker image.

so now that i'm done with the runner utility, i guess i should either

install a docker image somewhere on my machine which the registered runner will use? and then maybe i need to update the default docker image that i specified when i registered the runner
go back to earlier instructions: install docker first, then follow the docker instructions to register a runner

well, let's try number 1 first. we'll start by attempting to install docker

installing docker

referring to this guide

there's a lot of text. main mental hurdle i'm running into is whether to choose wsl2 backend or hyper-v. reading the explanations over a few times is making my head spin, so i go ahead and download the executable and hope that the installer makes it easy to figure out

i run the installer, and i'm immediately greeted with this:

seems like i need to make the wsl2 vs. hyper-v decision now. since they recommend wsl2, i will go with it. the only reservation i have with wsl2 is that i might need to remember wsl credentials, which isn't a terrible pain but i certainly need to know it.

i click ok.

i'm watching the installer run, and i'm wondering if i need to have installed wsl2 before installing it. oh well, we'll see soon enough if that's the case.

Success! i might still need to enable wsl2, but for now let's restart the computer.

ok, restart done, i accept the docker terms, and i'm told the wsl2 install is incomplete. but they conveniently give me a link with instructions on how to complete that, so i'll follow it.

it has me update the wsl2 linux kernel and install a copy of linux (i chose the latest LTS version of Ubunty, 22.0x). installing Ubuntu indeed requires me to set up credentials for it.

i restart docker. this time, i get an error and a nasty stack trace. crap. looks like i'll need to debug this stack trace tomorrow.

tl;dr: wanted to set up a pipeline that triggers upon a new push commit to a branch, learned that i need to set up my own runner in order to do that, got as far as installing docker (which will execute my runner) before running into problems using it.