A common pattern that is seemingly not mentioned much in the Python space is how to manage environment variables reasonably. Frankly, for how often we use environment variables, there is no great solution to this.
What am I talking about? When a program is invoked, it is operated in a space where the user can define variables within the computer host environment that the program can refer to. These would be known as "environment variables", where the user can hold password secrets or configurations like file directories to modify a program.
Why this is good is that we no longer have to pass in passwords hard-coded or store passwords in our Git projects. In DevOps, environment variables can be set up within the Deploy or Continuous Integration areas so that the programs can receive variables safely and in a secure manner.
However, environment variables aren't actually fun to load, and lead to serious code bloat and other unsightly things that are tedious to dig through and maintain.
An example; let's say you want to import a set of environment variables crucial to your application. You might create a file like this, which gets loaded in with your program in the CI process (more likely than not this file is treated as a plain Bash script and gets source
'd).
PATH_TO_DATA=/var/data
DB_URL=myapp.us.abc.s3.aws.amazon.com
DB_PASS=cupcake_sprinkles12345
These variables now exist in the program's host environment. If you didn't create this file and instead wanted to set environment variables, you can create any variable you want by doing this in the command line terminal.
$ MY_ARBITRARY_VARIABLE=5 python my_project.py
Getting these values into the Python program space is done by using the os
library, which deals with all operating system-level functions and interactions. Using the os.getenv
function, you can simply try to 'get' a variable.
import os
path_to_data = os.getenv("PATH_TO_DATA")
But as you can see, if this variable were ever to change, we're kind of stuck changing it not once, not twice, but three times in total - environment file, Python variable, and reference string.
Annoying, and just wait until there's multiple variables. But, there's a different way too. All the environment variables set by your running session can be loaded into Python via a globalized dictionary of sorts. Using os.environ
, you can see all variables your profile has set, whether it's via manual definition or shell profile definitions.
import os
print(os.environ)
{"PATH_TO_DATA" : "/var/data", ... }
That's a little bit easier to pass around, but unpacking all of those key-value pairs is a bit of a pain. If you wanted only a few out of a large sum of environment variables, you're still left holding the bag having to do a lot of name definition.
(as a side-note: if you've guessed it already, os.getenv
is simply a wrapper for dict.get
, but requires a keyword argument to be supplied to provide a fall-back value when a key doesn't exist. Minor annoyance, but why move logic that already existed to a new keyword argument? Seems silly)
Automatic Variable Environment Loading
If your project requires environment variables to be loaded in across multiple modules, it might be annoying to have to do this all the time. And at the same time, it might be annoying to have to pass in a gigantic dictionary of environment variables through your modules, especially if not all of them are not even required.
An idea that occurred to me here is to employ the dataclasses
library, which is a way of defining a sort of namedtuple
(if anyone remembers those) for various data that does not need to be treated like a class object.
from dataclasses import dataclass
@dataclass
class MyEnv:
PATH_TO_DATA: str
DB_URL: str
DB_PASS: str
Now that looks like something interesting, and it uses typing
annotations for better mypy
analysis (... an environment dictionary is only strings... this is silly... anyways...), but the fundamental problem here is that the dataclass needs to be populated with information still. This alone isn't good enough to get the job done.
def init_env():
return MyEnv(
os.environ["PATH_TO_DATA"],
os.environ["DB_URL"],
os.environ["DB_PASS"]
)
Yuck, is what you should be saying. I already hate it. Granted it's not the worst way to initiate something that can be shared and has formal rules, it's still tedious to initiate like this. Lots of unnecessary duplicate function calls.
The clever code-golfer you might be already sees the issue here - the repeated os.environ
call. Why not just lift that into a multi-argument unpack and use a list comprehension?
def init_env():
return MyEnv(
*[os.environ[v] for v in
["PATH_TO_DATA", "DB_URL", "DB_PASS"]]
)
Congratulations, you just annoyed everyone in your project with your clever code-golf. Yes, it's more functional, yes it's a code reduce and it will be fast, but still, I think we could probably do better than this, right?
The problem with environment variables is that it can lead to some weird code choices - you have a known list of variables you want to read, but somehow you need to get it into code. Maybe you create constants which hold the names of the variables you want to read, then read them later. Or maybe, you use a list
or a dict
of mappings to organize how it's going to come in easier.
The problem I see is that it creates too many pain-points of having to update code across a large project. You change one environment variable name, suddenly you're changing a ton of files just to update for that one reference. This isn't really a problem that can be aided by any software tricks.
If you want to bind variables to a collection like an Enum
or a dataclass
, then you need to share that object across multiple files again, and it all sort of leads back to a single source of pain.
Let me rewind a bit with a different path that maybe we can try and tackle. All of our environment variables are loaded into a dictionary in os.environ
.
import os
os.environ["PATH_TO_DATA"] # /var/data
A variable to this would be defined as something like:
import os
PATH_TO_DATA = os.environ["PATH_TO_DATA"]
The variable name is the same as the environment variable. Wouldn't it be easy if we could simply have those environment variables turned into variables automatically for us?
Via the hidden method __setattr__
, we can actually do something like this very easily. For this we will need a new class to generate instances of environment info for us.
import os
class EnvInfo(object):
def __init__(self, args=[]):
for k in args:
self.__setattr__(k, os.environ.get(k, ""))
# test it out
e = EnvInfo(["MY_VAR"])
print(e.MY_VAR)
Now try it out by doing
$ MY_VAR=5 python EnvInfo.py
5
It works by using __setattr__()
, which is the internal method for Python objects to set methods (each time you do a self.var
declaration, it's using __setattr__()
to bind it). The EnvInfo
class itself has no bound variables, so it's an open book for setting variables.
By handling a list of arguments and looping through them, you can create an environment variable dictionary-like object that is easy to pass through around your codebase. And by providing a list of variables, you are still limiting what variables are out-bound for your code, so no accidental slip-ups of important secret variables.
We could even go a step further and provide a dictionary of functions to invoke when an environment variable is loaded too. Let's say you wanted to convert things to numbers, or lists, or whatever you fancy.
import os
class EnvInfo(object):
def __init__(self, args=[], kwargs={}):
for k in args:
for k in args:
self.__setattr__(k, os.environ.get(k, ""))
for key, func in kwargs.items():
self.__setattr__(key, func(os.environ.get(key, "")))
e = EnvInfo([], {"MY_VAR": lambda x: int(x)})
print(type(e.MY_VAR))
Try it out again:
$ MY_VAR=5 python EnvInfo.py
<class 'int'>
This provides a way of better error-reporting or doing post-load transformations of variables. You might want to do string splits, integer conversions, URL checking, hashing, or any sort of testing. This kind of system helps to better improve testing and program correctness and makes it relatively simple to implement.
Now, is this an improvement on where we were before with environment variable loading? For the most part, I would say absolutely. In this small example, a main entry point program is responsible for passing environment variables as arguments to some functions.
import os
PATH1 = os.environ["PATH1"]
KEY1 = os.environ["KEY1"]
PATH2 = os.environ["PATH2"]
KEY2 = os.environ["KEY2"]
run_program(PATH1, KEY1)
run_program(PATH2, KEY2)
It's a lot of boilerplate, and as the functions grow in complexity, more environment variables may be needed. Extending the function signature is simply too much, because it would turn into a run-on code line. With our new system in place, it's a little bit easier. By modifying the run_program
signature to take in an EnvInfo
object instead:
import os
JOB1 = EnvInfo(["PATH1", "KEY1"])
JOB2 = EnvInfo(["PATH2", "KEY2"])
run_program(JOB1)
run_program(JOB2)
It's been simplified quite a bit and looks a lot cleaner. Extending this to support more variables is as simple as appending more strings to the input lists, and nothing else is required. Since the variables aren't concrete to the EnvInfo
implementation, some linters and language server protocols may not be entirely happy with you, but hey, it works.
Thanks for reading and hope you enjoyed!
Top comments (0)