markdown guide
 
 

I point to our cloud provider "RDS" SQL solutions - they managed the day to day backups.

However, because I cannot get rid of the paranoid itch in me (what if AWS & GCP goes down?)

I do a full dump onto a dedicated company server in a local data center about every month or so (in Singapore, where I am at). That is independent of the previous months dumps.

Just in case.

 

If you don't have legal locality-requirements, you could always do your backups to a replicated S3 bucket. That way, even if an entire region goes down, you'd still have recoverability. Then again, (and, again, if you don't have legal locality-requirements) you could probably also set up a job to directly replicate your RDSes from one region to another.

 

Yup - and it's quite awesome it can be automated with a few clicks (in the right places)


Btw: While this may sound quite dumb.

Where I am at. It is common to have disaster recovery plans which outline backups must not be the same company. You know just in case the entire account gets deleted, with data purged and all.

Because our critical data is not "that" huge - dumping it works well for me to comply with it (with encryption and all of course)

Thin foil hat, so that our auditors and investors can feel safe knowing that our data is safe in the event of WW3, zombie apocalypse, etc. We have a recovery plan for that!.

Nevermind the fact, that we do not have such a plan in place for food, water, zombies or people 😅

 

My background was IT/DevOPS/SecOPS before "inheriting" my way into development, so I got to influence our process when I came on board. Please excuse the wall of text and any misspelled wurds you see.

We re-did our whole backup strategy from a "whenever we thought about it" to an automated scheduled process with manual options for immediate needs:

On premise SQL data (MSSQL) has transaction logs snapped every 1/2-, 2-, or 4-hours depending on the database, and a full backup every night. I'm still working on convincing the people that control our MySQL server that we need a true replication scheme and backup strategy for that system.

We're finishing up moving our on-premise code repositories from an old TFS server to VSO which are also backed up to a cloud storage solution with rolling snapshots of all of the repositories every 4 hours.

Various on premise servers have a rolling differential backup done anywhere from every day to real-time depending on what kind of data it's got (the more mission-critical, the faster they get backed up).

The goal we set was to have point-in-time recovery on all of our mission critical data while still being able to protect everything else and not blow the budget out of the water. The changes have saved us on more than one occasion where the data loss was measured in seconds using this process instead of days in the previous.

If you made it this far, a bonus idea for you: every so often, take your backup file and try to restore it somewhere that's not important, like a dev server or test server. You'd be surprised how often the backup is corrupted or has some crazy quirk to get it to restore. 3AM the day of a 9AM release is not the time to discover this happened to you...

 

This sounds like a great strategy. Redundancy is so important. I also like the idea of spinning up the backed up data to see if the backup is right. I think that would be a critical element to any redundancy.

 

I know we have some kind of automated process on our server that runs at a fairly regular interval - but I don't know enough about it to explain it.

Sheepish Emoji

 

You'll know if it worked or not when disaster strikes. :p

 

Oh it works, I've been the disaster that has struck a few times now. LOL

Anyone that hasn't been is either lying to you or too new for it to have happened yet

 

I spent too much time to write a serverless lambda function that goes into the server and upload the backup into S3

github.com/coretabs-academy/postgr...

Guess what?
It didn't work in production, so I keep doing backups manually every once in a while 😂

 

We actually don't handle many databases, and luckily have near certain periods of downtime when they're sitting idle. Given this and the fact that we're stuck dealing with software written by idiots who didn't understand the concept of automation, we just back up the databases like we do any other files on the systems (albeit while forcing a commit just prior to the backup), by backing them up using Borg (with the data that needs backed up accessed remotely by the backup server when dealing with Windows systems) and then synchronizing the Borg repository off-site using rclone.

 

we backup our database...? well we currently have a two-node mysql compatible aurora cluster on aws so that we have a hot standby with automatic promotion when the master goes off-line. we still use backups in case we damage things and the damage gets replicated to the slave. backups are something you simply specify on the aws dashboard. we are then relying up on aws being able to not loose all the backups at the point where we damaged the database which seems okay.

 

Depends on where in the organization. In the legacy datacenters, "NetBackup". For those who've moved to the cloud - where the NetBackup domain doesn't reach and where architecting the connectivity to allow a media server to effectively operate there, it's "self-service".

Early on, we provided a set of simple reference scripts to show the cloud-users, "here's how you can use cron/scheduled tasks to leverage the CSP's native capabilities" (i.e., "volume snapshots" to provide at least crash-consistent recovery options). But, mostly, we've been encouraging those same users to leverage CSP options that include baked-in data-backup options.

 

For legacy systems we use scripts run via cron jobs that dump databases (this one sucks due to disk space potentially being used up during backup) and more recently automated snapshots via snapshooter.io

For dev machines an external hard drive using time machine works great :)

 

For our part we have cloud backups for our client data and also like to keep a paper copy for contract and other documents. We wouldn't want a google drive crisis to impact our business for instance.

For code we use github organization and keep a copy of each projects code as well as an off site raspberry pi git server. Which itself makes code copies on multiple hard drives.

We also do manual backups of our computer to hard drives which are then stored away in case of emergency in a fireproof small safe. (non network connected backups are great in case of ransom wares for instance)

 
 

We use Heptio Velero to automatically backup everything on our Kubernetes clusters, including persistent volumes (which are AWS EBS based). Backups are stored on a S3 bucket. I plan to enable cross-region replication for the bucket some time in the future, to have some redundancy in case AWS blows up a region.

We have scheduled weekly, daily or hourly backups based on how critical is each project. For production databases, we rely on AWS RDS with automatically scheduled backups, and for very important projects we use AWS Aurora with a self-healing master-slave setup.

 

We use RDS for all our databases which gets you data as recent as 5 minutes ago, and you can restore from a snapshot in some reasonable amount of time (restoring 20gbs is quick, 5 TB a smidge longer). The big caveat here is that if you accidentally delete an RDS instance (happened once in a dev environment), all the automated backups disappear along with it 🤬.

We haven't gone as far as doing cross-region replication of an RDS instance. We did it once as a test to make sure our terraform scripts could spin up an environment in another region, so we know we can do it if needed. We haven't yet needed it 🤞

For static sites hosted on S3 + Cloudfront - we use cross-region replication with S3, using cloudfront for failover. This is a particularly inexpensive and easy solution for a resilient static site - history has shown S3 craps out about once a year.

EC2 instances are baked or configured with Chef/Bash, and send logs to Splunk, so we don't care about any data on the host.

I find it useful to think of backups/recovery in terms of:

  • how fast you want restore the data
  • how recent that data can be
  • how much you want to pay
 

That depends on the scope and use case, but we use a combination of disk snapshots, full database, and if need be intermittent and real time. Backups are E2E encrypted and stored encrypted on Google Cloud Storage buckets. So it can certainly vary significantly. Can you define what specific data types you need to backup, or is this for general consumption?

 
 

An old sysadmin joke (but there is a grain of truth in every joke):

  • Nobody cares about the backup!
  • But everybody cares about the restore!!! :)
 

The big trucks go "beep beep beep" when in reverse, so ...

 

daily rsync helps with scripts kept on a build server (going to the secondary)

DB backups are weekly incrementals in production DB's

 
 

Wrote an AWS Lambda function that backs up to S3. For large backup that can't fit in Lambda writing a pass through stream to do the job

Classic DEV Post from Dec 31 '18

Your 2018 in Numbers

This is a great format for looking back on 2018: Ali Spittel 💁 @aspittel ...

Ben Halpern profile image
A Canadian software developer who thinks he’s funny.