DEV Community

santhoshnimmala
santhoshnimmala

Posted on • Updated on

Migrating existing files to FSx for Windows File Server using AWS DataSync

Hey, my Self Santhosh Nimmala, I am Working with Luxoft as a Principal consultant (leading Cloud and DevOps in TRM space), in coming Articles I will be explaining about DevOps and DevTools with respective to AWS it will also have real world DevOps projects with Code and common DevOps Patterns

You can also use Data Sync to transfer files between two FSx for Windows File Server file systems, including file systems in different AWS Regions and file systems owned by different AWS accounts. You can also use Data Sync with FSx for Windows File Server file systems for other tasks. For example, you can perform one-time data migrations, periodically ingest data for distributed workloads, and schedule replication for data protection and recovery.

In AWS DataSync, a location for FSx for Windows File Server is an endpoint for an FSx for Windows File Server. You can transfer files between a location for FSx for Windows File Server and a location for other file systems.

Image description

Components and terminology
The components of DataSync include the following:

Agent: A virtual machine (VM) that's used to read data from or write data to a self-managed location. An agent isn't required when transferring between AWS storage services in the same AWS account.

Location: Any source or destination location included in the data transfer. This can include Amazon S3, Amazon EFS, Amazon FSx for Windows File Server, Amazon FSx for Lustre, Amazon FSx for OpenZFS, Network File System (NFS), Server Message Block (SMB), Hadoop Distributed File System (HDFS), or self-managed object storage.

Task: A source location and a destination location, and a configuration that defines how data is transferred. A task always transfers data from the source to the destination. The configuration can include options such as the task schedule, bandwidth limit, and so on. A task is the complete definition of a data transfer.

Task execution: An individual run of a task, which includes information such as the start time, end time, bytes written, and status.

How datasync transfer files form On-prem to Cloud

  1. Choose the VPC and subnet where you’d like to set up the DataSync private IPs. This should be a VPC that extends to your on-premises environment via routing rules over Direct Connect or VPN. All communication between your DataSync agent and the DataSync service remains in this VPC.

  2. Deploy a DataSync agent on-premises, where it can access your source storage location via NFS or SMB. The OVA for the agent deployment can be downloaded from the DataSync console. Your agent does not need a public IP.

Note that a single agent can only be used for one type of transfer: over private or public endpoints. If you have an existing agent transferring data over the public internet, you’ll need to deploy a new agent to transfer data to private DataSync endpoints.

  1. Create a security group that will ensure correct access to the private IPs DataSync will use: a single VPC endpoint for control traffic and four ENIs that will be used for the data transfer. The security group will manage access to these private IPs and make sure your agent can route to them. Since the agent needs to establish connections to these IPs, configure inbound rules allowing the agent’s private IP (172.31.60.250 in the screenshot) to connect to the IPs DataSync uses. The agent needs to talk to ports 1024-1064, 443, and port 22.

Note: No outbound rules are required. When configuring the security group, remember to select the VPC you chose in step 1.

Image description

  1. Create a VPC endpoint for the DataSync service. In the Amazon VPC console, choose Endpoints from the navigation pane on the left, and click Create Endpoint. For Service category, choose AWS service. For Service Name choose DataSync in your region (e.g. com.amazonaws.us-east-1.datasync). Then select the VPC and security group you chose in the first and third steps, respectively. Make sure you uncheck Enable Private DNS Name.
    Image description

    1. Once the VPC endpoint you create becomes available, make sure the network configuration for your on-premises environment allows agent activation. Activation is a one-time operation which securely associates the agent with your AWS account. To activate the agent, use a computer that can reach the agent via port 80. After activation this access can be revoked. The agent should be able to reach the Private IP of the VPC endpoint you created in step 4. To find this IP, navigate to the Amazon VPC console, and choose Endpoints from the navigation pane on the left. Select the DataSync endpoint and look in the Subnets There you can find the private IP that corresponds to the subnet you chose.
  2. You’re now ready to activate your agent. If you have a computer that can route to the agent via port 80, and also access the DataSync console, navigate to the console and hit Create Agent. In the service endpoint form section, select VPC endpoints using AWS PrivateLink.

Select the VPC endpoint created in step 4, the subnet you chose in step 1, and the security group created in step 3. Enter your agent’s IP. If you cannot access the agent and the DataSync console using the same computer, you can activate the agent using the command line from a computer that can reach the agent’s port 80.

Image description

  1. Choose Get Key, and optionally enter an agent name and tags, and choose Create agent. Your new agent is now visible in the Agents tab of the DataSync console. The green VPC Endpoint banner indicates that all tasks performed via this agent will use private endpoints, without traversing the public internet. Image description

That’s it. You can now start your task, see its progress in the DataSync console, and re-run regularly to pick up incremental updates as needed.

Top comments (0)