This post follows our blog describing our ML Ops manifest. In this post we will dive into the our configuration management within our ML projects
Background
Working in a real-world business environment, requires moving codes between research/development, test and production environments, which are crucial for development velocity. While doing so, it is important to allow for a common language & standards between various AI & Development teams, for frictionless deployment of codes. Additionally configuration management assist in:
- ML work with many parameters,
- hyper params etc. ,
- we want to separate config from code (12 factor app)
These days, it comes without a surprise that there are several OpenSource (OS) configuration frameworks that can be utilized for this. After reviewing several options (including Hydra), we decided on dynaconf, since it fulfilled our requirements of being:
- Python based
- Simple
- Easily configurable and extendable
- Allow for overriding and cascading
Development Environment
Artlist runs in multiple cloud environments, however currently most of the ML workloads run on GCP. Following GCP best practices, we have set up different projects for each environment, thus allowing for strict isolation between them, in addition to enabling billing segmentation. This separation needs to be easily propagated into the configuration, for seamless code execution.
In this post we review our configuration in relation to
- Basic implementation
- Advance Templating
- Simple Overriding
- Project vs. Module settings
- Updating Configurations
Now let’s see how dynaconf
can help out with this.
Basic Implementation
We decided to work with configuration settings that are stored in external toml
files, which are easily readable and are becoming one of the de-facto standards in python.
A code snippet from our basic configuration file is as follows:
[default]
PROJECT_ID = "@format artlist-{this.current_env}"
BASE_NAME = “my_feature_name”
BASE_PIPELINE_NAME_GCP = "@jinja {this.BASE_NAME | replace('_', '-')}"
BUCKET_NAME = "@format {this.BASE_NAME}--{this.current_env}"
[dev]
SERVICE_ACCOUNT="service-account1@artlist-dev.iam.gserviceaccount.com"
[tst]
SERVICE_ACCOUNT="service-account2@artlist-tst.iam.gserviceaccount.com"
[prd]
SERVICE_ACCOUNT="service-account3@artlist-prd.iam.gserviceaccount.com"
Now let's break it down.
Whenever dynaconf runs - it runs in a specific environment. The default environment is called DEVELOPMENT. However, since we wanted to move easily between the environments (and GCP resources), we changed the naming convention of the environments (to a 3 letter acronym = dev, tst, prd), so we can readily reference the relevant GCP project while specifying the environment.
Using the env_swithcher
, we can indicate to dynaconf which configuration to load and what GCP project to access with the following line:
PROJECT_ID = "@format artlist-{this.current_env}"
Using the @format
as a prefix to the string, we can parse the parameter that is within the curly brackets. For example, if the current environment is set to ‘dev’ the PROJECT_ID variable will be artlist-dev
, thus accessing only the resources from the dev
project, whereas if the environment is set to ‘prd’ the PROJECT_ID will be artlist-prd
.
Accessing the rest of the relevant variables is based on the various sections in the toml file.
For example, referencing the Production Service Account (SA) will be by accessing the SERVICE_ACCOUNT
variable which is under the [prd] section
Advance Templating
Dynaconf includes the ability to work with Jinja
templating - this can be useful for manipulating strings. GCP has a quark that requires naming containers within the GCP registry so as not to have ‘_’ (underscore) as separators, but rather ‘-’ (hyphen). And since we wanted to sync our registry and the artifacts coming out of Vertex AI pipelines (that are stored within buckets / Cloud Storage), we were able to keep the python naming convention of ‘_’ , while converting the strings to the GCP convention when required.
Using the jinja’s text replace method we can easily alter the text as necessary:
BASE_PIPELINE_NAME_GCP = "@jinja {this.BASE_NAME | replace('_', '-')}"
Simple Overriding
Another useful feature of dynaconf
is that you can easily override the configuration using local settings. This is very convenient since local settings for development doesn’t need to be checked into source control, while the general settings should be synced to the entire team.
All that is required to differentiate between the settings is to add the .local
suffix to the file name, see example below:
- General Settings - settings.toml
- Local Settings - settings.local.toml
Whenever dynaconf
identifies the suffix .local.toml
it will overwrite the variables configuration that exists in the settings.toml
with the loaded settings.local.toml
file.
An example for overwriting with local credentials
Project vs. Module settings
Our ML framework is KubeFlow (hosted by GCP VertexAI pipelines), which requires various configurations: some at the component level (which are reused independently in various pipelines), while others are at the pipeline/project/cross-component level. To load both settings, we can use another feature of dynaconf
which can define a specific file name template that will automatically be loaded by dynaconf
. Here is our implementation practice:
- Any configurations that are at the project level will be written in the project settings -
settings_prj.toml
(see dynaconf settings_files configuration) - Any configurations that are at the component level will be written in the component settings -
settings.toml
.
Updating Configurations
Sometimes there is a need to update the configuration during runtime, this can be challenging since the entire configuration is loaded immediately when the library is called. To do so, we can use a decorator to update the configuration. Assuming the cfg
is the configuration settings, we can write the following decorator:
@input_to_config(cfg)
def input_to_config(config, sequence_override=True):
"""[decorator] override config parameters with function inputs
Args:
config: Dynaconf configuration / settings to be updated
wrapped_func ([function]): [the function to capture it's input and push to the config]
sequence_override: configures the option for overriding keys or merging the values
"""
def decorator(wrapped_func):
@wraps(wrapped_func)
def inner(*args, **kwargs):
_override_config(kwargs, config, sequence_override=sequence_override)
return wrapped_func(*args, **kwargs)
return inner
return decorator
Summary
In this blog post, we have laid out our configuration implementation using dynacof
library. We saw how we
- Used the basic setup of
dynaconf
configuration - Synced our GCP project with
dynaconf
environments - Worked with the advanced
dynaconf
settings
In our next posts, we will extend our description of the various elements that have been incorporated into our ML project workflow, while developing our internal base library, which include standardization of:
- Logging
- Accessing our secrets (using GCP Secret Manager)
- Conduct our experiment tracking (using ClearML )
# Image copyright
The banner image was co-created using DALLE2
Top comments (1)
Great post! I like the idea of having a decorator to update config at runtime, I would like to bring this idea to the core dynaconf.