Config files are everywhere. There are lots of reasons your app might need to have one:
You have configuration that you want to persist beyond a reboot.
Your configuration represents a physical state; for example, it contains the settings for peripheral devices, a stored procedure for accomplishing a task, or maybe it expresses the layout of the live user interface.
Your app's configuration cannot be easily expressed as a series of variables. CI pipelines, workflows, etc. feature a lot of complex nesting, repeated blocks, and even internal linking.
You want the app to be able to persist its own changes to configuration, like changing of windows sizes, menu settings, or credentials. In this case, the config file is functioning more as a database than something the user writes.
In all of these cases, the structure of the config is very important and likely long-lived. Mistakes in your config syntax will be hard to undo, so it pays to have a plan upfront, and design for it to be extended and documented.
In this article, we'll learn how to load YAML config files in a way that is clean, easy to support, and easy to extend. We'll do this by creating our own YAML task automation syntax, which we'll call taskbook files:
# taskbook.yml
group: # name of group
tasks: # list of tasks
- name: # name of task
module: # module to use
options:
# key / value options
# ...
We'll write a program to read them, which we'll call Taskable *. When finished, it will be easy to determine fields that are supported, validate config values safely, add more fields for future needs, and even access config values within our program as properties.
*Any similarity to Ansible playbook syntax, real or imagined, is purely coincidental. 😂
Create a command line tool
Let's create a file called taskable.py
, to contain our implementation of Taskable:
# taskable.py
import argparse
def main():
parser = argparse.ArgumentParser()
parser.add_argument("file", type=argparse.FileType("r"))
args = parser.parse_args()
if __name__ == " __main__":
main()
This provides the scaffolding for an argparse
command line interface (for more info, see our article on Python CLIs).
You can run the script as follows:
python3 taskable.py
$ python3 taskable.py
usage: taskable.py [-h] file
taskable.py: error: the following arguments are required: file
To be able to read in a file, we need to create the file first, which we'll do next.
Create a taskbook file
I'll be using YAML for the config files because it's easy to read and I'm comfortable with it, but you can easily support JSON or TOML, as they offer similar APIs.
Create a taskbook.yml
file and add the following:
# taskbook.yml
group: localhost
tasks:
- name: copy file.txt to the place
module: saucy.copy
options:
source: file.txt
dest: /etc/file.txt
- name: install a package
module: cheesy.package
options:
name:
- fzf
- tree
upgrade: true
- name: enable the service
module: lettuce.service
options:
enable: true
start: true
At this point, we'll be able to run the following:
python3 taskable.py taskbook.yaml
However, nothing will happen because our app doesn't print anything yet.
Read in the YAML file
YAML files are easy to read with Python. There are multiple libraries available, but pyyaml
is the de facto standard and is often installed on whatever system you're already on.
If you don't have pyyaml
(or you're using a virtual environment because you're awesome), install it now:
pip install pyyaml
Then, in your taskable.py
file, import the yaml
package and read in the YAML file:
import yaml
...
data = yaml.safe_load(args.file)
Our taskable.py
file so far:
# taskable.py
import argparse
import yaml
def main():
parser = argparse.ArgumentParser()
parser.add_argument("file", type=argparse.FileType("r"))
args = parser.parse_args()
data = yaml.safe_load(args.file)
if __name__ == " __main__":
main()
At this point, you will be able to read in the YAML file, but there's still no output just yet. We could stop here and access its values as nested dictionaries and arrays, like so:
data["tasks"][0]["module"]
...but there are a couple problems with this.
First, there's no validation at all, so a malformed config file has unpredictable results. Second, strings are opaque data, so IDE auto-completion won't work; changing a field name will require manually searching through the code to do so; and I hope you never misspell a field name.
No, we can do a lot better, and we will, starting by building a model of our data in the next section.
Create the data model
We need a way to express our data format so that it's functional. For this purpose, I prefer to use attrs
, which gives us data validation, makes our classes more performant, allows us to access our fields as properties with dramatically less boilerplate, and more.
Let's install attrs
:
pip install attrs
Then add the following to your taskable.py
file:
from typing import Any
...
from attrs import define, field
@define
class Task:
name: str
module: str
options: dict[str, Any] = field(factory=dict)
@define
class Taskbook:
group: str
tasks: list[Task]
Our taskable.py
file so far:
# taskable.py
import argparse
from typing import Any
import yaml
from attrs import define, field
@define
class Task:
name: str
module: str
options: dict[str, Any] = field(factory=dict)
@define
class Taskbook:
group: str
tasks: list[Task]
def main():
parser = argparse.ArgumentParser()
parser.add_argument("file", type=argparse.FileType("r"))
args = parser.parse_args()
data = yaml.safe_load(args.file)
if __name__ == " __main__":
main()
These two classes—Task
and Taskbook
—fully express the taskbook format. We won't instantiate them ourselves though, because we'll learn a method to do so automagically in the next section.
Structurize into models
"Structurize" is a $6 word (that I may have made up) that translates to, "load all your data into fancy model classes." I'm using it because "de-serialize" sounds awful and is harder to type. 😝
The easiest way to structurize your YAML data into attrs
classes is by using the cattrs
package. The simplest usage looks like this:
import cattrs
taskbook = cattrs.structure(data, Taskbook)
Let's add it to our taskable.py
file:
# taskable.py
import argparse
from typing import Any
import cattrs
import yaml
from attrs import define, field
@define
class Task:
name: str
module: str
options: dict[str, Any] = field(factory=dict)
@define
class Taskbook:
group: str
tasks: list[Task]
def main():
parser = argparse.ArgumentParser()
parser.add_argument("file", type=argparse.FileType("r"))
args = parser.parse_args()
data = yaml.safe_load(args.file)
taskbook = cattrs.structure(data, Taskbook)
if __name__ == " __main__":
main()
That's all you need! cattrs
will load the data into attrs
classes after only being given the expected top-level class, which is Taskbook
here.
If you need to tweak the behavior, cattrs
provides a hook mechanism. It's a bit cumbersome, but it's easier than writing all the structurization code from scratch.
In the next section, we'll work on doing something useful with our data.
Use the data
At this point, we've fully structurized our data into classes, which means we can access our config data like this:
taskbook.tasks[0].module
This makes our code much easier to read and work with. Now we'll try using it to do stuff.
"Run" tasks
What good is our script if it can't run tasks? Let's add something to simulate "running" our hypothetical tasks, by adding the following to our taskable.py
file:
...
print("group", taskbook.group)
for task in taskbook.tasks:
print(f"run {task.module}: {task.name}")
...
Our taskable.py
file so far:
# taskable.py
import argparse
from typing import Any
import cattrs
import yaml
from attrs import define, field
@define
class Task:
name: str
module: str
options: dict[str, Any] = field(factory=dict)
@define
class Taskbook:
group: str
tasks: list[Task]
def main():
parser = argparse.ArgumentParser()
parser.add_argument("file", type=argparse.FileType("r"))
args = parser.parse_args()
data = yaml.safe_load(args.file)
taskbook = cattrs.structure(data, Taskbook)
print("group", taskbook.group)
for task in taskbook.tasks:
print(f"run {task.module}: {task.name}")
if __name__ == " __main__":
main()
Running our hypothetical tasks will output the following:
python3 taskable.py taskbook.yml
$ python3 taskable.py taskbook.yml
group localhost
run saucy.copy: copy file.txt to the place
run cheesy.package: install a package
run lettuce.service: enable the service
It's not hard to imagine connecting this skeleton to real module implementations to drive real task execution.
List used modules
Maybe we'd like to inspect our taskbook to find out what modules it uses. This would be useful, for example, to install necessary modules before running our tasks.
Let's add a -l
/ --list
option to list used modules and exit without running the tasks:
...
parser.add_argument("-l", "--list", action="store_true")
...
if args.list:
used_modules = sorted(list(set(task.module for task in taskbook.tasks)))
for module in used_modules:
print(module)
return
...
Our taskable.py
file so far:
# taskable.py
import argparse
from typing import Any
import cattrs
import yaml
from attrs import define, field
@define
class Task:
name: str
module: str
options: dict[str, Any] = field(factory=dict)
@define
class Taskbook:
group: str
tasks: list[Task]
def main():
parser = argparse.ArgumentParser()
parser.add_argument("-l", "--list", action="store_true")
parser.add_argument("file", type=argparse.FileType("r"))
args = parser.parse_args()
data = yaml.safe_load(args.file)
taskbook = cattrs.structure(data, Taskbook)
if args.list:
used_modules = sorted(list(set(task.module for task in taskbook.tasks)))
for module in used_modules:
print(module)
return
print("group", taskbook.group)
for task in taskbook.tasks:
print(f"run {task.module}: {task.name}")
if __name__ == " __main__":
main()
Running taskable.py
with the list mode enabled:
python3 taskable.py -l taskbook.yaml
$ python3 taskable.py -l taskbook.yaml
cheesy.package
lettuce.service
saucy.copy
Woot! Static analysis! And it was easy to implement because our data model is so well-defined.
Summary
In this tutorial, we've built up a versatile config loading mechanism.
This setup works equally well for tiny command line utilities as it does for large and complex data formats files like task workflows, specifications, and so on. You can continue growing your application by adding new fields and new data models, and avoid the malignant technical debt that springs from a muddled early config implementation.
The best part? Your configuration will be stable and serve the bedrock and foundation of your application, now and in the future. In the words of Eric S. Raymond:
Smart data structures and dumb code works a lot better than the other way around.
— Eric S. Raymond
Stay smart, people! 😄
Top comments (0)