DEV Community

Romaric P.
Romaric P.

Posted on

3 2

How to seed your dev Postgres DB with your prod DB with RepliByte

Intro

In my company, we build a platform for developers to help them deploying easily their apps on AWS. One major feature that we have is the Preview Environment - which let any developer to create a full replica environment from the production for every pull request. It's convenient and we had to find a way to clone the apps and the databases with the data included. That's why I created RepliByte - an open-source tool written in Rust to synchronize cloud databases and hide sensitive data 🔥

Backup your prod Postgres DB into S3

source:
  connection_uri: $DATABASE_URL
  encryption_key: $MY_PRIVATE_ENC_KEY # optional 
bridge:
  bucket: $BUCKET_NAME
  access_key_id: $ACCESS_KEY_ID
  secret_access_key: $AWS_SECRET_ACCESS_KEY
Enter fullscreen mode Exit fullscreen mode

To run the backup

replibyte -c prod-conf.yaml backup run
Enter fullscreen mode Exit fullscreen mode

To list your backups

replibyte -c prod-conf.yaml backup list

type          name                    size    when                    compressed  encrypted
PostgreSQL    backup-1647706359405    154MB   Yesterday at 03:00 am   true        true
PostgreSQL    backup-1647731334517    152MB   2 days ago at 03:00 am  true        true
PostgreSQL    backup-1647734369306    149MB   3 days ago at 03:00 am  true        true
Enter fullscreen mode Exit fullscreen mode

Clean sensitive data

RepliByte provides the Transformers to clean up the sensitive data from your database.

# Transformers

Here is a list of all the transformers available.

| id              | description                                                                                        | available |
| --------------- | -------------------------------------------------------------------------------------------------- | --------- |
| transient       | Does not modify the value                                                                          | yes       |
| random          | Randomize value but keep the same length (string only). [AAA]->[BBB]                               | yes       |
| first-name      | Replace the string value by a first name                                                           | yes       |
| email           | Replace the string value by an email address                                                       | yes       |
| keep-first-char | Keep only the first char for strings and digit for numbers                                         | yes       |
| phone-number    | Replace the string value by a phone number                                                         | yes       |
| credit-card     | Replace the string value by a credit card number                                                   | yes       |
| redacted        | Obfuscate your sensitive data (>3 characters strings only). [4242 4242 4242 4242]->[424**********] | yes       |
Enter fullscreen mode Exit fullscreen mode

To use the Transformers, you need to edit your configuration file and add them:

source:
  connection_uri: $DATABASE_URL
  encryption_key: $MY_PRIVATE_ENC_KEY # optional 
  transformers:
    - database: public
      table: employees
      columns:
        - name: last_name
          transformer_name: random
        - name: birth_date
          transformer_name: random-date
        - name: first_name
          transformer_name: first-name
        - name: email
          transformer_name: email
        - name: username
          transformer_name: keep-first-char
    - database: public
      table: customers
      columns:
        - name: phone
          transformer_name: phone-number
bridge:
  bucket: $BUCKET_NAME
  access_key_id: $ACCESS_KEY_ID
  secret_access_key: $AWS_SECRET_ACCESS_KEY
Enter fullscreen mode Exit fullscreen mode

Then your sensitive data will be hidden while seeding your dev Postgres DB 👌

Seed your dev Postgres DB

To restore a backup, you first need to declare a destination in your YAML config file.

bridge:
  bucket: $BUCKET_NAME
  access_key_id: $ACCESS_KEY_ID
  secret_access_key: $AWS_SECRET_ACCESS_KEY
destination:
  connection_uri: $DATABASE_URL
  decryption_key: $MY_PUBLIC_DEC_KEY # optional
Enter fullscreen mode Exit fullscreen mode

Then, you need to run replibyte backup list to list all the backup available

replibyte -c prod-conf.yaml backup list

type          name                    size    when                    compressed  encrypted
PostgreSQL    backup-1647706359405    154MB   Yesterday at 03:00 am   true        true
PostgreSQL    backup-1647731334517    152MB   2 days ago at 03:00 am  true        true
PostgreSQL    backup-1647734369306    149MB   3 days ago at 03:00 am  true        true
Enter fullscreen mode Exit fullscreen mode

and replibyte restore to seed your dev database

replibyte -c prod-conf.yaml restore -v latest

OR 

replibyte -c prod-conf.yaml restore -v backup-1647706359405
Enter fullscreen mode Exit fullscreen mode

What else?

  • RepliByte is written in Rust and all operations are made on the fly. Meaning no extra disk space is consumed and there is no data leak risk. ⚡️

  • RepliByte also supports MongoDB (Thanks to Benny - contributor) 🔥

  • Complete data synchronization 💪🏼

  • Work on different any cloud providers 🌍

  • You can use multiple transformers to hide your sensitive data 🙈

  • Designed to backup TB of data 🏆

  • Skip data sync for specific tables 👌

  • On-the-fly data (de)compression (Zlib) and de/encryption (AES-256)🛡

Conclusion

RepliByte is a command line tool that makes database seeding super easy and convenient. I am working on a way to restore a database locally with Docker in one command. More is coming so stay tuned and feel free to share your feedback.

RepliByte GitHub: https://github.com/Qovery/replibyte

Heroku

Build apps, not infrastructure.

Dealing with servers, hardware, and infrastructure can take up your valuable time. Discover the benefits of Heroku, the PaaS of choice for developers since 2007.

Visit Site

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay