DEV Community

Paulet Wairagu
Paulet Wairagu

Posted on

QN : Introduction to Azure Data Lake Storage Gen2

  • data lake : repository of data stored in natural format as blobs or files.

  • Azure Data Lake Storage is a comprehensive, massively scalable, secure, and cost-effective data lake solution for high performance analytics built into Azure.

  • ADLS is optimized for analytical workloads; High data volumes supported to stream and batch solutions.

  • ADLS exposes data (file hierarichical system) through API endpoints making it accesible through modern compute technologies e Microsoft Databricks.

  • ADLS uses layered access control model :

    • Azure Role based Access Control : read and write access
    • Azure Attribute-based access control (Azure ABAC) : role assignments
    • Access control lists (ACLs) : File level control Permissions aren't automatically inherited from parent directories after a child item is created. However, you can configure default permissions on a parent directory, which are then applied to new child items at the time they're created.
  • Data Processing requires less computational resources since data is stored in directories and sub-directories like a file system.

  • Data Redundancy : Data Lake Storage inherits all Azure Blob Storage replication models.

  • Locally redundant storage (LRS) keeps multiple copies within a single data center

  • Zone-redundant storage (ZRS) replicates data across availability zones in the same region.

  • Geo-redundant storage (GRS) or read-access geo-redundant storage (RA-GRS) replicates data to a secondary region.

  • Geo-zone-redundant storage (GZRS or RA-GZRS) combines zone and geographic redundancy.

Top comments (0)