Intro
In my company, we build a platform for developers to help them deploying easily their apps on AWS. One major feature that we have is the Preview Environment - which let any developer to create a full replica environment from the production for every pull request. It's convenient and we had to find a way to clone the apps and the databases with the data included. That's why I created RepliByte - an open-source tool written in Rust to synchronize cloud databases and hide sensitive data π₯
Backup your prod Postgres DB into S3
source:
  connection_uri: $DATABASE_URL
  encryption_key: $MY_PRIVATE_ENC_KEY # optional 
bridge:
  bucket: $BUCKET_NAME
  access_key_id: $ACCESS_KEY_ID
  secret_access_key: $AWS_SECRET_ACCESS_KEY
To run the backup
replibyte -c prod-conf.yaml backup run
To list your backups
replibyte -c prod-conf.yaml backup list
type          name                    size    when                    compressed  encrypted
PostgreSQL    backup-1647706359405    154MB   Yesterday at 03:00 am   true        true
PostgreSQL    backup-1647731334517    152MB   2 days ago at 03:00 am  true        true
PostgreSQL    backup-1647734369306    149MB   3 days ago at 03:00 am  true        true
Clean sensitive data
RepliByte provides the Transformers to clean up the sensitive data from your database.
# Transformers
Here is a list of all the transformers available.
| id              | description                                                                                        | available |
| --------------- | -------------------------------------------------------------------------------------------------- | --------- |
| transient       | Does not modify the value                                                                          | yes       |
| random          | Randomize value but keep the same length (string only). [AAA]->[BBB]                               | yes       |
| first-name      | Replace the string value by a first name                                                           | yes       |
| email           | Replace the string value by an email address                                                       | yes       |
| keep-first-char | Keep only the first char for strings and digit for numbers                                         | yes       |
| phone-number    | Replace the string value by a phone number                                                         | yes       |
| credit-card     | Replace the string value by a credit card number                                                   | yes       |
| redacted        | Obfuscate your sensitive data (>3 characters strings only). [4242 4242 4242 4242]->[424**********] | yes       |
To use the Transformers, you need to edit your configuration file and add them:
source:
  connection_uri: $DATABASE_URL
  encryption_key: $MY_PRIVATE_ENC_KEY # optional 
  transformers:
    - database: public
      table: employees
      columns:
        - name: last_name
          transformer_name: random
        - name: birth_date
          transformer_name: random-date
        - name: first_name
          transformer_name: first-name
        - name: email
          transformer_name: email
        - name: username
          transformer_name: keep-first-char
    - database: public
      table: customers
      columns:
        - name: phone
          transformer_name: phone-number
bridge:
  bucket: $BUCKET_NAME
  access_key_id: $ACCESS_KEY_ID
  secret_access_key: $AWS_SECRET_ACCESS_KEY
Then your sensitive data will be hidden while seeding your dev Postgres DB π
Seed your dev Postgres DB
To restore a backup, you first need to declare a destination in your YAML config file.
bridge:
  bucket: $BUCKET_NAME
  access_key_id: $ACCESS_KEY_ID
  secret_access_key: $AWS_SECRET_ACCESS_KEY
destination:
  connection_uri: $DATABASE_URL
  decryption_key: $MY_PUBLIC_DEC_KEY # optional
Then, you need to run replibyte backup list to list all the backup available
replibyte -c prod-conf.yaml backup list
type          name                    size    when                    compressed  encrypted
PostgreSQL    backup-1647706359405    154MB   Yesterday at 03:00 am   true        true
PostgreSQL    backup-1647731334517    152MB   2 days ago at 03:00 am  true        true
PostgreSQL    backup-1647734369306    149MB   3 days ago at 03:00 am  true        true
and replibyte restore to seed your dev database
replibyte -c prod-conf.yaml restore -v latest
OR 
replibyte -c prod-conf.yaml restore -v backup-1647706359405
What else?
- RepliByte is written in Rust and all operations are made on the fly. Meaning no extra disk space is consumed and there is no data leak risk. β‘οΈ 
- RepliByte also supports MongoDB (Thanks to Benny - contributor) π₯ 
- Complete data synchronization πͺπΌ 
- Work on different any cloud providers π 
- You can use multiple transformers to hide your sensitive data π 
- Designed to backup TB of data π 
- Skip data sync for specific tables π 
- On-the-fly data (de)compression (Zlib) and de/encryption (AES-256)π‘ 
Conclusion
RepliByte is a command line tool that makes database seeding super easy and convenient. I am working on a way to restore a database locally with Docker in one command. More is coming so stay tuned and feel free to share your feedback.
RepliByte GitHub: https://github.com/Qovery/replibyte
 

 
    
Top comments (0)