DEV Community

Alec Dutcher
Alec Dutcher

Posted on • Updated on

DP-203 Study Guide - Design and implement the data exploration layer

Study guide

Create and execute queries by using a compute solution that leverages SQL serverless and Spark cluster

  • Azure SQL Serverless
    • Not SQL database - it is SQL compute in Azure Synapse Analytics
    • Serverless SQL pool
      • Built in to Synapse
      • Always available
      • Billed based on usage
    • Data access
      • No data storage
      • Data accessed through ADL
      • OPENROWSET syntax to access data
    • Provisioned resources
      • Dedicated SQL pool
      • Static number of servers
      • User chooses runtime
      • Defined cost per data warehouse unit (DWU)
      • Data is stored in relational tables using columnar storage
    • Used mainly for EDA
    • In the portal
      • Develop section on left-side panel
      • Click + button and add SQL script
      • Select tables from lake DB or SQL DB
      • Choose SQL pool settings
      • SELECT * FROM OPENROWSET( BULK '', FORMAT = 'parquet') AS [result]
      • SELECT * FROM OPENROWSET( BULK '', FORMAT = 'CSV', Parser_Version = '2.0') AS [result]
      • Can also go to Data in left-side panel and link storage account and containers - this can be used to auto-generate basic SELECT queries
  • Spark clusters
    • Apache Spark in Synapse
    • In-memory cluster computing
    • Synapse offers ease of use and creation
    • Data access is interacting with Spark pools through notebooks (similar to Databricks)
    • Databases and tables created in a Spark pool are replicated in a serverless SQL pool as read-only
    • In the portal
      • Under Develop in left-side panel
      • Click + button and select or create a notebook
      • Under Manage on left-side panel, create and run an Apache Spark pool
      • Be sure to enable automatic pausing, Spark pools are expensive

Recommend and implement Azure Synapse Analytics database templates

  • Database Templates
    • Speed up design process
    • Create more thorough databases
  • Lake database in Synapse
    • Data lakes lack structure
    • Databases can be too structured
    • Lake database removes these downsides
    • Provides structured DB with meta info, stored in a data lake (parquet, delta, CSV formats)
    • Powered by serverless Synapse compute
  • In the portal
    • Architecture process
      • Access Synapse Studio instance
      • Create a Lake Database
      • Add a Table
      • Add Template
      • Select relevant features

Push new or updated data lineage to Microsoft Purview

  • Microsoft Purview
    • Unified data governance
    • On-prem, multi-cloud, SaaS
    • 4 pillars
      • Data quality
      • Data stewardship
      • Data protection and compliance
      • Data management
    • Data lifecycle management
      • Data catalog - organized inventory of data assets
      • Data estate insights - infrastructure helps organizations manage data
      • Data sharing - internally or across orgs
      • Data policy - provision access to data at scale
    • Primary use cases for Purview
      • Pull data from SQL DB and ADL and provide governance across the org
      • Financial services can show where critical data is stored to evaluate security risk
      • Large, diverse orgs can enable data democratization
  • Data lineage
    • Track data flow over time
    • Origination --> Delta (data changes) --> Sink (output)
    • Provides confidence in data
    • Facilitates governance and impact analysis
    • In the portal
      • Lineage tab shows a flow chart with sources, processes, and targets

Browse and search metadata in Microsoft Purview Data Catalog

  • In the portal
    • Open the Microsoft Purview Governance Portal
    • Data Catalog --> Browse --> By collection or source type
    • Go to Data map in left-side panel to register data sources
      • Data map --> Data sources --> Register
      • Need to do a new scan to establish lineage
      • Requires access control to be configured to allow Purview to scan the data sources

Top comments (0)