AWS DataSync is an online data transfer service that simplifies, automates, and accelerates moving data between on-premises storage systems and AWS storage services, and also between AWS storage services. DataSync can copy data between Network File System (NFS), Server Message Block (SMB) file servers, self-managed object storage, Amazon Simple Storage Service (Amazon S3) buckets, Amazon EFS file systems, and Amazon FSx for Windows File Server file systems.
To understand the science behind DataSync, we need to understand few key components.
Agent An agent is a VM that you own that is used to read or write data from self-managed storage systems. The agent can be deployed on VMware ESXi, KVM, Microsoft Hyper-V hypervisors, or it can be launched as an Amazon EC2 instance.
Location Any source or destination location that is used in the data transfer (for example, Amazon S3, Amazon EFS, Amazon FSx for Windows File Server, NFS, SMB, or self-managed object storage).
Task Consists of a source location and a destination location, and configuration that define how data is transferred. Configuration settings can include options such as how to treat metadata, deleted files, and permission.
Task execution An individual run of a task, which includes information such as start time, end time, bytes written, and status.
- In the LAUNCHING status, DataSync initializes the task execution.
- In the PREPARING status, DataSync examines the source and destination file systems to determine which files to sync.
- In the TRANSFERRING status, DataSync starts transferring files and metadata from the source file system to the destination.
- In the VERIFYING status, DataSync verifies consistency between the source and destination file systems.
Source: AWS Documentation