This article is about a simple encrypted storage for python applications using AES/GCM
and base64
.
Introduction
Sometimes, you have secret constants in the code (passwords, tokens, API keys etc.).
The standard approach is to set them via environment variables. This approach is safe, but it comes with the inconvenience of managing your setup. You must ensure that all required environment variables are present before the code runs.
In CI/CD, you would use a secrets storage or manager. In deployment, we need to figure out how to inject the environment variables, whether we work on staging, production etc.
It may be more convenient if the code itself can manage everything. It needs to know the target environment (CI/CD, staging, production etc.) and only one key/password to unlock all other secret settings.
The approach below will allow shipping all your settings (including secret or sensitive ones) within the code because we will encrypt them at rest.
When the code runs, it decrypts the secrets in memory using the key, which it takes from the environment as the external setting.
To fully configure your code, you will only need two environment variables to set beforehand:
-
ENVIRONMENT
(for example,dev
, orci
orprod
) ENVIRONMENT_PASSWORD
Setup
To configure the secrets vault, you need to set one environment variable -- ENVIRONMENT_PASSWORD
.
This variable is used to encrypt and decrypt your secrets.
You always keep the value of this variable secret. You set it up locally in your development environment or inject it into your CI/CD via a secrets manager/storage provided by your hosting (for example, GitHub Secrets).
Prepare your secrets
We put the secrets in the code in encrypted form. To encrypt the secrets, you can use the following CLI:
python vault_cli.py --help
For example:
python vault_cli.py encrypt my_secret_password
Output:
aes:4XgUg6US2yzMUf6367PNGlQb0RsVXV5I20VqJhwGTVOcgcdLQdj3WLOnISzC82xaPmE=
You use this value in the code.
The PASSWORD
and KEY
values in the example below were encrypted this way.
Example
Before running this example, you need to set ENVIRONMENT_PASSWORD
:
export ENVIRONMENT_PASSWORD=super_secret
The code below contains both plain and encrypted values in settings.
The setting()
function returns a given value by name.
At the start, the code calls vault.check()
from vault.py to verify the consistency of the vault. If the value of ENVIRONMENT_PASSWORD
is incorrect, the function will throw an exception.
The vault.check()
function uses a hardcoded value. It is not used in the user code. It is the self-check purpose only.
Are you curious what that magic DIGEST
variable contains? :). You can decrypt it via python vault_cli.py decrypt
command.
import vault
settings = {
"USERNAME": "staff",
"PASSWORD": "aes:ANyPgY1HTuJPS2mo2Xze/+MUZl8992deZdFHYml67QKWCJ0NzX4=",
"KEY": "aes:cGJKLBYfGXMLX/oTulGxZFz97ap+SZ3uJBODogfixTzjnIW82g==",
}
def setting(name: str) -> str:
return vault.decrypt(settings[name])
def main():
print('login credentials')
print('username =', setting('USERNAME'))
print('password =', setting('PASSWORD'))
print('key =', setting('KEY'))
if __name__ == '__main__':
vault.check()
main()
Why is it all useful?
It is safe to commit this code to public repositories. All sensitive values are protected by AES/GCM encryption.
Of course, you should NEVER expose ENVIRONMENT_PASSWORD
.
Security considerations
We need to understand that this approach may have downsides.
If your environment password leaks, all environments using it are compromised. However, the threat of leaking the environment password is not much different from the threat when your set of environment settings leaks.
To mitigate the exposure of all environments to be compromised, you can use different environment passwords for different environments.
It will require minor code changes, but I leave it to you as a home exercise.
Finally, you should know how encryption works in this method.
We use AES with 256 bit key in GCM mode. GCM does not require padding. It also allows checking the consistency of the data on decryption.
The 256-bit key is generated via SHA256
from the password.
nonce
(initial vector for AES) is generated randomly. It guarantees
that same data will look different when encrypted plays the role of
a salt.
AES/GCM returns the cypher text (encrypted data), nonce
and tag
. All three participate in decryption.
The encrypt
function concatenates all three values, wraps the result
in base64
and prepends it with aes:
prefix.
The prefix allows to recognise encrypted values and skip the encryption
for non-encrypted ones.
def get_aes_key(password: str) -> bytes:
return hashlib.sha256(password.encode()).digest()
def encrypt_value_aes(value: bytes, password: str) -> bytes:
nonce = secrets.token_bytes(16)
aes = AES.new(get_aes_key(password), AES.MODE_GCM, nonce=nonce)
encrypted, tag = aes.encrypt_and_digest(value)
return nonce + tag + encrypted
def encrypt(value: str, password: Optional[str] = None) -> str:
encryped = encrypt_value_aes(value.encode(), get_password(password))
return 'aes:' + base64.b64encode(encryped).decode()
The decryption works in reverse.
Links
All the code from this article is in github.com/begoon/pyvault GitHub repository.
Top comments (0)