This is 3 part series documenting my journey through Software development. For Journey Through DevOps - Part 2: The Awakening, click here
This post will be an insight into the automation that we use in our projects. This is not a tutorial but a demonstration of how automation can change the software process.
Background
We manage the deployment of 2 applications - Chatwoot and Rasa-X. We aim to deliver content through the Natural Language Processing framework of Rasa and use Chatwoot to track and intervene in the chat, if necessary.
Technology stack
- Cloud: AWS
- Deployment: Helm, Kubernetes(1.21, Elastic Kubernetes Service)
- Datastore: Postgres, Redis (AWS managed)
- Applications: Rasa-X(Python), Chatwoot(Ruby on Rails)
- VCS: Gitlab
Setup
Kubernetes
We use AWS EKS for deploying our applications. The pods run on both FarGate and Managed Node-groups to get the best of both worlds. Since the micro-services of chatwoot are stateless we use FarGate, whereas the Rasa-X deployment uses node groups for certain pods that need fewer resources to run and also need to be stateful.
Infrastructure
For AWS infrastructure, we use Terraform. The repo is connected as a Version control system to Terraform cloud allowing us to focus more on the infrastructure itself instead of having to CI pipelines.
Helm
We use Helm charts for both Rasa-X and Chatwoot, latter of which was built by us. Helm allows us to have a standard way to deploy the application instead of a lot of YAML files which need modification every time there's an update.
The Automation
Automation by its nature, is intuitive. General rule of thumb that we follow, is that if a particular task is done more than 3 times, it is best to automate it.
Let's start with the lightweight components.
Chatwoot
Since chatwoot is an open source project and we don't add a lot of custom code to chatwoot, there isn't any need for CI/CD pipelines for chatwoot. Every update, it's easier to run
helm upgrade <release-name> chatwoot/chatwoot
Infrastructure
Since we're using Terraform cloud, this is also very simple. Although we did consider using Terraform locally in pipelines, this was an overhead considering as our team was small. Note that you can use Terraform cloud for both execution of Terraform and storing the Terraform state alone.
Rasa-X
This is the component that involves the maximum number of repetitions, and therefore is fully automated.
Rasa-X has several components but we'll focus on the important ones for now, such as Rasa open source and Rasa Custom Actions server. Former is the NLP part of Rasa and the latter is a python server that lets you run custom events in your chatbot such as fetching data from a database. Since chatbots are iterative, it was imperative that we automate this process before moving ahead.
CI/CD
This part requires you to repetitively train NLP models and test them just as much. Hence use gitlab-ci pipelines to train a model, subsequently test it and upload the results as artefacts.
build-actions:
image: docker:20.10.7
stage: build
services:
- docker:20.10.7-dind
script:
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
- chmod +x ./ci.sh
- ./ci.sh
train-job:
stage: train
script:
- rasa train --fixed-model-name $CI_COMMIT_SHORT_SHA
artifacts:
paths:
- models/
expire_in: 1 day
data-validate-job:
stage: validate
script:
- rasa data validate
core-test-job:
stage: test
script:
- rasa test core --model models/ --stories test/ --out results
nlu-test-job:
stage: test
script:
- rasa test nlu --nlu data/nlu.yml --cross-validation
upload-job:
stage: report
script:
- echo "Upload results"
artifacts:
paths:
- results/
This can be broken into 2 parts. The custom actions server part and the Rasa bot.
Custom actions is a python server, so we dockerize to allow for easier development and standardisation. The "./ci.sh" is a shell script to read the branch name from the environment variable and set the docker image tag accordingly, to build and push it to the registry.
The next parts of this pipeline, train a model, validate that model for any conflicts and errors and finally run tests on them. All the tests are uploaded as artefacts in the pipeline. This allows us to log every model and iteration. The --fixed-model-name tag is to ensure that models have a predictable name which can be used further in the pipeline. Rasa defaults to using Timestamp which can be unpredictable.
Branches:
- main: This is the branch that represents production, so every piece of code that resides here is battle tested and verified. Hence it is safe to assume that with every push into this branch, we can update the NLP model in the server.
Docker tag for custom actions: stable
deploy-job:
stage: deploy
#before_script: []
#image: registry.gitlab.com/gitlab-org/cloud-deploy/aws-base:latest # see the note below
script:
- apt-get update
#- apt install git-all -y
- curl -k -F "model=@models/$CI_COMMIT_SHORT_SHA.tar.gz" "http://rasax-url.com/api/projects/default/models?api_token=$RASAXTOKEN"
- echo "Application successfully deployed."
- aws eks update-kubeconfig --name ${EKS_CLUSTER_NAME}
- curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
- helm plugin install https://github.com/rimusz/helm-tiller --kubeconfig=$HOME/.kube/kubeconfig
- helm repo add rasa-x https://rasahq.github.io/rasa-x-helm
- helm upgrade rasa rasa-x/rasa-x -n rasa --set app.name=$CI_REGISTRY/weunlearn/wulu2.0/rasa_actions --set "app.tag=stable" --reuse-values # Redeploys the kubernetes deployment with a new image name while reusing already existing values
rules:
- if: '$CI_PIPELINE_SOURCE == "push" && $CI_COMMIT_BRANCH == "main"'
This stage in gitlab pipelines that is only executed when the event is a push event and the branch committed to is main. This prevents any unnecessary triggering of model training for minute changes in various other branches.
- develop: Since this is a non-production environment, there is no deployment from this branch. The deploy stage from the previous section is not executed. However, since this branch serves as a reference point before updating production code in the main branch, it has a static name.
The test results are uploaded as before. This allows us to keep track of every model iteration.
Docker tag for custom actions: develop
Workflow
Pushes into non-main, branches:
Pushes into main
The pipeline is the same as before except it has a final stage
Conclusion
This setup enables a tech team of 2 people, to manage multiple applications in our production environment, while also ensuring a standardised way to perform quality control. Now the goal of the tech team is to solve problems and build the product, as opposed to dedicating a significant share of time to figuring out how to manage the product.
Top comments (0)