Intro
In my company, we build a platform for developers to help them deploying easily their apps on AWS. One major feature that we have is the Preview Environment - which let any developer to create a full replica environment from the production for every pull request. It's convenient and we had to find a way to clone the apps and the databases with the data included. That's why I created RepliByte - an open-source tool written in Rust to synchronize cloud databases and hide sensitive data π₯
Backup your prod Postgres DB into S3
source:
connection_uri: $DATABASE_URL
encryption_key: $MY_PRIVATE_ENC_KEY # optional
bridge:
bucket: $BUCKET_NAME
access_key_id: $ACCESS_KEY_ID
secret_access_key: $AWS_SECRET_ACCESS_KEY
To run the backup
replibyte -c prod-conf.yaml backup run
To list your backups
replibyte -c prod-conf.yaml backup list
type name size when compressed encrypted
PostgreSQL backup-1647706359405 154MB Yesterday at 03:00 am true true
PostgreSQL backup-1647731334517 152MB 2 days ago at 03:00 am true true
PostgreSQL backup-1647734369306 149MB 3 days ago at 03:00 am true true
Clean sensitive data
RepliByte provides the Transformers to clean up the sensitive data from your database.
# Transformers
Here is a list of all the transformers available.
| id | description | available |
| --------------- | -------------------------------------------------------------------------------------------------- | --------- |
| transient | Does not modify the value | yes |
| random | Randomize value but keep the same length (string only). [AAA]->[BBB] | yes |
| first-name | Replace the string value by a first name | yes |
| email | Replace the string value by an email address | yes |
| keep-first-char | Keep only the first char for strings and digit for numbers | yes |
| phone-number | Replace the string value by a phone number | yes |
| credit-card | Replace the string value by a credit card number | yes |
| redacted | Obfuscate your sensitive data (>3 characters strings only). [4242 4242 4242 4242]->[424**********] | yes |
To use the Transformers, you need to edit your configuration file and add them:
source:
connection_uri: $DATABASE_URL
encryption_key: $MY_PRIVATE_ENC_KEY # optional
transformers:
- database: public
table: employees
columns:
- name: last_name
transformer_name: random
- name: birth_date
transformer_name: random-date
- name: first_name
transformer_name: first-name
- name: email
transformer_name: email
- name: username
transformer_name: keep-first-char
- database: public
table: customers
columns:
- name: phone
transformer_name: phone-number
bridge:
bucket: $BUCKET_NAME
access_key_id: $ACCESS_KEY_ID
secret_access_key: $AWS_SECRET_ACCESS_KEY
Then your sensitive data will be hidden while seeding your dev Postgres DB π
Seed your dev Postgres DB
To restore a backup, you first need to declare a destination
in your YAML config file.
bridge:
bucket: $BUCKET_NAME
access_key_id: $ACCESS_KEY_ID
secret_access_key: $AWS_SECRET_ACCESS_KEY
destination:
connection_uri: $DATABASE_URL
decryption_key: $MY_PUBLIC_DEC_KEY # optional
Then, you need to run replibyte backup list
to list all the backup available
replibyte -c prod-conf.yaml backup list
type name size when compressed encrypted
PostgreSQL backup-1647706359405 154MB Yesterday at 03:00 am true true
PostgreSQL backup-1647731334517 152MB 2 days ago at 03:00 am true true
PostgreSQL backup-1647734369306 149MB 3 days ago at 03:00 am true true
and replibyte restore
to seed your dev database
replibyte -c prod-conf.yaml restore -v latest
OR
replibyte -c prod-conf.yaml restore -v backup-1647706359405
What else?
RepliByte is written in Rust and all operations are made on the fly. Meaning no extra disk space is consumed and there is no data leak risk. β‘οΈ
RepliByte also supports MongoDB (Thanks to Benny - contributor) π₯
Complete data synchronization πͺπΌ
Work on different any cloud providers π
You can use multiple transformers to hide your sensitive data π
Designed to backup TB of data π
Skip data sync for specific tables π
On-the-fly data (de)compression (Zlib) and de/encryption (AES-256)π‘
Conclusion
RepliByte is a command line tool that makes database seeding super easy and convenient. I am working on a way to restore a database locally with Docker in one command. More is coming so stay tuned and feel free to share your feedback.
RepliByte GitHub: https://github.com/Qovery/replibyte
Top comments (0)