Filipe Motta for AWS Community Builders

Posted on Jun 22, 2021

Backup your file server or any data using Hybrid Cloud - Storage Gateway Solution (AWS)

#aws #linux

Introduction

You can use cloud storage for on-premises data backups to reduce infrastructure and administration costs. There is several ways to backup your on-premise infrastructure to cloud, here I’ll use hybrid cloud (part of your infrastructure is on the cloud and part of your infrastructure is on the on-premise) to accomplish it. In this post I’ll show AWS Storage Gateway solutions to bridge between on-premise data to cloud. The common use case is disaster recovery, backup & restore and tiered storage.

There is 3 types of Storage Gateway:

Before show how I implemented it, I’ll show a brief overview about each one.

File Gateway

Databases and applications are often backed up directly to a file server on-premises. You can now simply point these backups to the File Gateway, which copies the data to Amazon S3. Supports S3 standard, S3 IA, S3 One Zone IA so you can configure your bucket policy to move this data to any storage class in Amazon S3, depending your needs.

Features:

NFS/SMB protocol supports
Lifecyles policies in Amazon S3
Windows ACL
Most recently used data is cached in the file gateway
Support S3 Object Lock
Bandwith optimized
Can be mounted on many servers

Volume Gateway

The Volume Gateway provides either a local cache or full volumes on premises while also storing full copies of your volumes in the AWS Cloud. Volume Gateway also provides Amazon EBS snapshots of your data for backup or disaster recovery.

Features:

Block storage using iSCSI protocol backed by S3
Backed by EBS snapshots which can help restore on-premise volumes
On-premises cache of recently accessed data
Two types of Storage Volume Gateway: * Cached volumes: low latency access to most recent data * Stored volumes: entire dataset is on premise, scheduled backups to S3

Tape Gateway

You can use Tape Gateway to replace physical tapes with virtual tapes in AWS. Tape Gateway acts as a drop-in replacement for tape libraries, tape media, and archiving services, without requiring changes to existing software or archiving workflows. Most used for enterprise backup purpose.

Features:

VirtualTape Library (VTL) backed by Amazon S3 and Glacier
Back up data using existing tape-based processes (and iSCSI interface)
Works with leading backup software vendors
In Summary, if you use: File Gateway => File access / NFS (backed by S3) Volume Gateway => Volumes / Block Storage / iSCSI (backed by S3 with EBS snapshots) Tape Gateway => VTLTape solution / Backup with iSCS (backed by S3 and Glacier)

The Volume Gateway Implementation

As I said before, there are many ways to choose your strategy, each company has individual needs. Volume Gateway stores and manages on-premises data in Amazon S3 on your behalf and operates in either cache mode or stored mode. In my case, even though a backup partition is a file share, I chose the Volume Gateway type because of the following features:

I need on-premises cache of recently accessed data to be accessed was needed (Provides low latency access to cloud-backed storage)
I need backed by EBS snapshots which restore on-premise volumes when needed ( I used Volume Gateway in conjunction with Linux file servers on premises to provide scalable storage for on-premises file applications with cloud recovery options. I used a stored volume architecture, to store all data locally and asynchronously back up point-in-time snapshots to AWS.)
With stored volumes you can store your primary data locally, while asynchronously backing up that data to AWS. Stored volumes provide your on-premises applications with low-latency access to their entire datasets. At the same time, they provide durable, offsite backups. You can create storage volumes and mount them as iSCSI devices from your on-premises application servers. Data written to your stored volumes is stored on your on-premises storage hardware. This data is asynchronously backed up to Amazon S3 as Amazon Elastic Block Store (Amazon EBS) snapshots. You can maintain your volume storage on-premises in your data center. That is, you store all your application data on your on-premises storage hardware. This solution is ideal if you want to keep data locally on-premises, because you need to have low-latency access to all your data, and also to maintain backups in AWS.

See the following diagram about stored volumes architecture:

You can deploy Volume Gateway as a virtual machine or on Ec2 instance. In this case, I have used as virtual machine architecture on premise infrastructure, but you can use as you want.

Create a Gateway Type

Before you create a volume to store data, you need to create a gateway and specify the kind of gateway you’ll use.

First, Open the AWS Management Console at https://console.aws.amazon.com/storagegateway/home, and choose the AWS Region that you want to create your gateway in.

Now it is necessary to choose a Host Platform and Downloading the VM appliance. You have to choose a hypervisor option, deploy the downloaded image to your hypervisor. Add at least one local disk for your cache and one local disk for your upload buffer during the deployment. See some requirements here:

Hardware requirements for on-premises VMs

When deploying your gateway on-premises, you must make sure that the underlying hardware on which you deploy the gateway VM can dedicate the following minimum resources:

Four virtual processors assigned to the VM.

16 GiB of reserved RAM for file gateways

For volume and tape gateways, your hardware should dedicate the following amounts of RAM:

16 GiB of reserved RAM for gateways with cache size up to 16 TiB

32 GiB of reserved RAM for gateways with cache size 16 TiB to 32 TiB

48 GiB of reserved RAM for gateways with cache size 32 TiB to 64 TiB

80 GiB of disk space for installation of VM image and system data.

Depending your hypervisor and if you set on premise infrastructure, you have to check certain options according each hypervisor. In my case I have to setup the following:

VMware Setup
Store your disk using the Thick provisioned format option. When you use thick provisioning, the disk storage is allocated immediately, resulting in better performance. In contrast, thin provisioning allocates storage on demand. On-demand allocation can affect the normal functioning of AWS Storage Gateway. For Storage Gateway to function properly, the VM disks must be stored in thick-provisioned format.

Configure your gateway VM to use paravirtualized disk controllers

Now you have to choose a Service Endpoint. This is used to how your gateway will communicate with AWS storage services over the public internet. In my case I used public service endpoint.

The next step is Connecting Your Gateway. To do this it is necessary to get the IP address or activation key of your gateway VM. So, in my vmware environment I have connected through console and set the IP address. So, to connect gateway, verify that your gateway VM is running for activation to succeed and if you can access the IP address that you have setup previously.

Now you have to configure your gateway to use the disks that you have created when you deployed the appliance and according the requirements of the type of the gateway. As I have used for stored volume purpose, I had to configure the upload buffer according the requirements. Bellow, are table with Depending your hypervisor and if you set on premise infrastructure, you have to check certain options according each hypervisor. In my case I have to setup the following:

Configure your gateway VM to use paravirtualized disk controllers

Now you have to choose a Service Endpoint. This is used to how your gateway will communicate with AWS storage services over the public internet. In my case I used public service endpoint.

Now you have to configure your gateway to use the disks that you have created when you deployed the appliance and according the requirements of the type of the gateway. As I have used for stored volume purpose, I had to configure the upload buffer according the requirements. Bellow, are table with differents sizes requirements according the type you will use. size requirements according the type you will use.

Create a Volume

Once the gateway was created, its time to create storage volume to which your applications read and write data. Previously, you have allocated disks to upload buffer in the case of stored volume - Volume Gateway.

As you can see bellow, to create a volume in the storage gateway created, you need to select it, select the disk ID that you have create to stored data, in this case it depends which kind of hypervisor you are use.

Note that in this step, if you would like to restore data from EBS volume you need to select it and specify the snap ID you want. You can use a existing disk too and new empty volume, which is our case.

You need to specify ISCSI target volume too.

You will be asked to configure chap authentication, a provider protection against playback attacks by requiring authentication to access storage volume targets. If you do no accept it, the volume will accept connections from any ISCSI initiator.

Once created a volume, you will see the volumes availables to be mounted on your initiator.

Using a Volume

In my case, my purpose was used hybrid cloud to store samba files on-premise and have backup protection with snapshots on the cloud. So, I will show you how to create a partition to use a disk created previously.

Depending which Linux Operation System distribution you are used, it is necessary to install a packager to manager ISCSI ( iscsi-initiator-utils - Red Hat Enterprise Linux Client )

Once installed, discover the volume or VTL device targets defined for a gateway. Use the following discovery command.

iscsiadm --mode discovery --type sendtargets --portal [GATEWAY_IP]:3260

The output of the discovery command should like this:

[GATEWAY_IP]:3260,1 iqn.1997-05.com.amazon:part-home
[GATEWAY_IP]:3260,1 iqn.1997-05.com.amazon:disk-test-purpose

Now its is necessary to connect to a target.

iscsiadm --mode node --targetname iqn.1997-05.com.amazon:[ISCSI_TARGET_NAME] --portal [GATEWAY_IP]:3260,1 --login

Make sure to replace the [ISCSI_TARGET_NAME] to value that you have setup previously and obsviously, the gateway IP, the command should like this:

iscsiadm --mode node --targetname iqn.1997-05.com.amazon:disk-test-purpose --portal [GATEWAY_IP]:3260,1 --login

The successful output should like this:

Logging in to [iface: default, target: iqn.1997-05.com.amazon:disk-test-purpose, portal: [GATEWAY_IP],3260] (multiple)
Login to [iface: default, target: iqn.1997-05.com.amazon:disk-test-purpose, portal: [GATEWAY_IP],3260] successful.

Verify that the volume is attached to the client machine (the initiator). To do so, use the following command.

ls -l /dev/disk/by-path
total 0
lrwxrwxrwx 1 root root  9 abr  7 09:59 ip-[GATEWAY_IP]:3260-iscsi-iqn.1997-05.com.amazon:disk-test-purpose-lun-0 -> ../../sdd

Verify the volume attached in our case is /dev/sdd, so we will format to use this device.

Formatting Your Volume using Logical Volumes

Now lets use LVM to provides more flexibility and can also be resized dynamically when needed without any restarts.

The first thing to do is to create physical disk on the device created previously on the storage gateway and initiated by the ISCSI.

pvcreate /dev/sdd
Physical volume "/dev/sdd" successfully created.

Verify that the physical volume was created.

pvdisplay /dev/sdd
  "/dev/sdd" is a new physical volume of "20,00 GiB"
  --- NEW Physical volume ---
  PV Name               /dev/sdd
  VG Name               
  PV Size               20,00 GiB
  Allocatable           NO
  PE Size               0   
  Total PE              0
  Free PE               0
  Allocated PE          0
  PV UUID               Vy51TU-sSDDF-SSDFD-Hnzz-a0uP-u3WP-gI0E7C

Now you need to create volume group or add a existing one. For example, I already had a disk with 500GB allocated, if I need do expand it, I only need to add the disk to a existing volume group and expand a existing logical volume. But for now, we will create a volume group and create logical volume. Now, lets create a volume group and add physical volume created.

vgcreate stg_gtw /dev/sdd
Volume group "stg_gtw" successfully created

vgdisplay stg_gtw
  --- Volume group ---
  VG Name               stg_gtw
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  1
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                0
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               <20,00 GiB
  PE Size               4,00 MiB
  Total PE              5119
  Alloc PE / Size       0 / 0   
  Free  PE / Size       5119 / <20,00 GiB
  VG UUID               ZYAOrS-zgsdd-To9k-SDDS-gwpn-xPzo-DEdFDF

So, lets create a logical volume with 100% Volume group size (In this case 20GB):

lvcreate -l 100%FREE -n lv_disk_teste stg_gtw
Logical volume "lv_disk_teste" created.

lvdisplay /dev/stg_gtw/lv_disk_teste
  --- Logical volume ---
  LV Path                /dev/stg_gtw/lv_disk_teste
  LV Name                lv_disk_teste
  VG Name                stg_gtw
  LV UUID                EqsSDDe-OsGG4-a0Tr-djFFGX-fKSy-Enu6-jCDF1QV
  LV Write Access        read/write
  LV Creation host, time .local, 2021-04-07 10:36:17 -0300
  LV Status              available
  # open                 0
  LV Size                <20,00 GiB
  Current LE             5119
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           254:4

Once created, lets format the partition to store data.

> mkfs.ext4 /dev/stg_gtw/lv_disk_teste
mke2fs 1.44.5 (15-Dec-2018)
Creating filesystem with 5241856 4k blocks and 1310720 inodes
Filesystem UUID: s23e234-569c-4fdf-a4b2-5856e790e3fa
Superblock backups stored on blocks: 
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
    4096000

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done   

> mkdir /mnt/disk-teste-purpose
> mount /dev/stg_gtw/lv_disk_teste /mnt/disk-teste-purpose
> df -h
Sist. Arq.                          Tam. Usado Disp. Uso% Montado em
udev                                2,0G     0  2,0G   0% /dev
tmpfs                               393M   26M  368M   7% /run
/dev/mapper/vg--root-lv--root        14G  2,9G   11G  22% /
tmpfs                               2,0G     0  2,0G   0% /dev/shm
tmpfs                               5,0M     0  5,0M   0% /run/lock
tmpfs                               2,0G     0  2,0G   0% /sys/fs/cgroup
/dev/mapper/storageGW-lv_volHomeGW  493G  398G   70G  86% /mnt/storageGW
tmpfs                               393M     0  393M   0% /run/user/0
/dev/mapper/stg_gtw-lv_disk_teste    20G   45M   19G   1% /mnt/disk-teste-purpose

Finally, its done !! You can now store your files or any data (any database and so on) in the partition created and its was automatically backup to Amazon AWS with EBS Snapshots.

The last tip, if you want to startup the initiator on boot you need to start node automatically editing the follow file:

vim /etc/iscsi/nodes/iqn.1997-05.com.amazon:disk-test-purpose/[GATEWAY_IP],3260,1/default

set the option node.startup to automatic.

...
node.startup = automatic
...

Obviouslly, you have to setup fstab to mount the file system on boot and have the scsci service to startup on boot too.

Now you can create plan for EBS snapshot, so you can schedule your snapshots to be automatically created specifying the period and retention.

I hope this post was useful for you.

Oldest comments (1)

Welkson Renny de Medeiros • Aug 2 '21

Excelente post!

Obrigado @filipemotta !

Welkson Medeiros
Natal/RN