Configuration hell
Sometimes it's the trivial parts that confuses me. They are being portrayed as "don't spend too much time on it". However a structured configuration might really save you some headaches.
After realizing that I had a few harmless config differences on my prod env, being accumulated from different sources like docker-compose, or dockerfile and env vars, I have been meaning to change my flask app's config structure for some time now and I finally found time to do it because I started to assume things. And it's dangerous if you are starting to assume things. Having a single source of truth for your config variables is important to me. Otherwise you get lost in the config sea, trying to understand which config variable gets overwritten/ignored by which file.
Before this refactor I have been using python-dotenv
and a python file called config.py
with multiple separated classes such as BaseConfig
, DevConfig
and ProdConfig
etc. to manage different configurations for different environments. At first I knew what I was doing, but at some point, it started to get complicated and I began to mix things up and what belonged where.
How to choose the right config file format?
There are several configuration file formats out there, such as JSON, YAML, TOML, ini or .env files and similar. As you can see we don't have the perfect solution, and it looks like different library developers have different ideas regarding how the perfect configuration file should look like.
For one thing, JSON files are missing comments, which sounded like a deal-breaker for me initially. TOML files become overly verbose when it comes to too much nested data. Don't know about YAML, maybe it's the choice for you. There are different opinions on each one, search for the one that suits you.
Personally the decisions I'm taking ultimately come down to how simple the tool is. Start with the simple and try out more complicated ones as you need them. Therefore I went with JSON, even though it doesn't allow comments. That's fine. If I really crave for comments, maybe I can think of another solution then. But as I said, stick to your simple solution unless you absolutely can't do without.
Previously on "Configuration Hell"
I had the following structure when it came to my configuration. I'm using the factory pattern in flask to make the process of creating an app with different configurations easier. This method also simplifies unit testing immensely.
# app/__init__.py
from flask import Flask
def create_app(config_type: str = "DevConfig") -> Flask:
app = Flask(__name__)
# Load the configuration.
app.config.from_object(f"config.{config_type}")
...
return app
# config.py
class Config:
"""
Common settings for all environments.
"""
STATIC_FOLDER = pathlib.Path(__file__).parent / "app" / "static"
UPLOAD_FOLDER = STATIC_FOLDER / "uploads"
PDF_FOLDER = STATIC_FOLDER / "pdfs"
THUMBNAILS_FOLDER = STATIC_FOLDER / "thumbnails"
MAX_CONTENT_LENGTH = 16 * 1000 * 1000
TESTING = False
...
class DevConfig(Config):
SQLALCHEMY_DATABASE_URI = os.environ.get("SQLALCHEMY_DATABASE_URI", "mysql+pymysql://user:pwd@mariadb:3306/mydatabase")
...
class ProdConfig(Config):
SQLALCHEMY_DATABASE_URI = os.environ["SQLALCHEMY_DATABASE_URI"]
...
class TestConfig(Config):
TESTING = True
SQLALCHEMY_DATABASE_URI = os.environ.get("SQLALCHEMY_DATABASE_URI", "sqlite:///")
...
The Config
class holds the common configuration variables that won't change from env to env. And the following classes respectively, hold configuration variables that are specific to their environments.
The confusion I'm having here is, when do I need to feed the env variables? When I'm developing on dev, there could be some sane defaults. When I'm deploying to prod, those sane defaults should definitely be ignored because they are specific to dev environment. I also want the application to yell at me when I'm on production and I haven't supplied the database connection string for instance. There mustn't be any default values there and it has to be supplied conciously. There are also several different config possibilities depending on whether I'm working with docker or not. I want to see each of them clearly and detect the differences with a single glance, instead of looking under docker-compose, or looking under aws task definition or several locations at the same time, trying to merge them in my head.
Refactor
In order to make things clear, I created an instance
folder to hold all of my json configurations. This folder will include configurations for all the environments, that is dev.json
, prod.json
, docker-dev.json
, docker-prod.json
, test.json
and finally an example file called config.json
, which should behave as a template for all the other files.
We will commit the config.json
file to the version control system because it doesn't contain any sensitive information in it, just some fake data. Additionally, I also committed the test.json
into the VC because it doesn't contain secrets either in my case. I use it to run my unit/integration tests on github actions.
Now my instance folder looks like this:
And my factory function looks like this:
# app/__init__.py
def create_app(env: Optional[str] = "prod") -> Flask:
"""Create flask app with the given configuration."""
app = Flask(__name__, instance_relative_config=True)
# Load the configuration.
app.config.from_object("config.Config")
app.config.from_file(f"{env}.json", load=json.load)
Here is my config.py file. This file will be here to hold some logic for configuration. I don't want to hardcode file paths, I want them to be relative to the config.py
in the repo.
# config.py
import pathlib
class Config:
# Upload folder under static.
STATIC_FOLDER = pathlib.Path(__file__).parent / "app" / "static"
UPLOAD_FOLDER = STATIC_FOLDER / "uploads"
PDF_FOLDER = STATIC_FOLDER / "pdfs"
THUMBNAILS_FOLDER = STATIC_FOLDER / "thumbnails"
MAX_CONTENT_LENGTH = 16 * 1000 * 1000
# Enable this in order to echo all the sql commands to the db.
# SQLALCHEMY_ECHO = True
This is the instance/config.json
file that behaves as an example template for all other config files.
{
"SECRET_KEY": "verysecret",
"PERMANENT_SESSION_LIFETIME": 1800,
"GOTENBERG_URL": "http://gotenberg:3000/forms/libreoffice/convert",
"SQLALCHEMY_DATABASE_URI": "mysql+pymysql://user:pwd@localhost:3306/mydatabase",
"ADOBE_PDF_EMBED_API_KEY": "",
"ALLOWED_EXTENSIONS": [
"txt",
"pdf",
"png",
"jpg",
"jpeg",
"gif",
"ppt",
"csv",
"html",
"xls",
"xlsx",
"docx",
"doc"
],
"TESTING": false,
"WTF_CSRF_ENABLED": true,
"MAIL_USERNAME": "example@mail.com",
"MAIL_PASSWORD": "password",
"MAIL_SERVER": "smtp.server.com",
"MAIL_PORT": 587,
"MAIL_USE_TLS": true,
"MAIL_USE_SSL": false,
"MAIL_DEFAULT_SENDER": "example@mail.com",
"SQLALCHEMY_ECHO": false
}
Gunicorn
By having a separate config file for each env, now I can see exactly what kind of differences I have between the environments. When I want to fire up an app that specifically should use a configuration file called docker-prod.json
, that file should exist and it should contain all of the needed vars under it. This file is the single source of truth for that environment.
CMD gunicorn --bind 0.0.0.0:5000 "app:create_app(env='docker-prod')"
Some struggles with passing json config to the docker container
Since I'm building my container in the github actions, there's no way for me to include my docker-prod.json
into the container directly. Therefore I have discovered a simple trick, that enables me to encode my config file completely into a single base64 string (which in turn saves me from including all of my env variables one-by-one), put it under github secrets and pull/decode it from there during the build process.
In the job that I'm currently building my container, I have the following logic:
jobs:
deploy:
name: Deploy
runs-on: ubuntu-latest
environment: production
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Decode docker-prod.json
run: echo -n "${{ secrets.DOCKER_PROD_JSON }}" | base64 --decode > $GITHUB_WORKSPACE/instance/docker-prod.json
...
What happens here? The checkout actions checks-out the source code and decodes the long base64 string from github secrets, putting it under instance/docker-prod.json
. Eventually when the application starts, I have my config file right where it needs to be without checking it in under version control.
Conclusion
Alright, this is huge for me. Discovering the above trick allowed me to host my prod config free of charge on github secrets. I'm not sure how much it costs to store your secrets in AWS Secret Manager, but yeah...
Now just by looking at different config files I have in my instance
folder, I instantly see what database connection string I'm using, what email credentials are there, what different api keys I'm making use of in different environments. I don't need to go through multiple config files and try to guess which one supersedes the other.
Notes to self
Maybe this is a naive approach, maybe not. Perhaps you will find some useful points here, perhaps you'll just shrug and say "meh, pretty basic stuff". I'm writing this article for self-document purposes, which should guide me to better practices along the way. What I learned, how I learned, how I applied, how it turned out in the end.
I'll stick to this for a while and see how it works out.
Top comments (0)