DEV Community

Alec Dutcher
Alec Dutcher

Posted on • Edited on

DP-203 Study Guide - Design and implement the data exploration layer

Study guide

Create and execute queries by using a compute solution that leverages SQL serverless and Spark cluster

  • Azure SQL Serverless
    • Not SQL database - it is SQL compute in Azure Synapse Analytics
    • Serverless SQL pool
      • Built in to Synapse
      • Always available
      • Billed based on usage
    • Data access
      • No data storage
      • Data accessed through ADL
      • OPENROWSET syntax to access data
    • Provisioned resources
      • Dedicated SQL pool
      • Static number of servers
      • User chooses runtime
      • Defined cost per data warehouse unit (DWU)
      • Data is stored in relational tables using columnar storage
    • Used mainly for EDA
    • In the portal
      • Develop section on left-side panel
      • Click + button and add SQL script
      • Select tables from lake DB or SQL DB
      • Choose SQL pool settings
      • SELECT * FROM OPENROWSET( BULK '', FORMAT = 'parquet') AS [result]
      • SELECT * FROM OPENROWSET( BULK '', FORMAT = 'CSV', Parser_Version = '2.0') AS [result]
      • Can also go to Data in left-side panel and link storage account and containers - this can be used to auto-generate basic SELECT queries
  • Spark clusters
    • Apache Spark in Synapse
    • In-memory cluster computing
    • Synapse offers ease of use and creation
    • Data access is interacting with Spark pools through notebooks (similar to Databricks)
    • Databases and tables created in a Spark pool are replicated in a serverless SQL pool as read-only
    • In the portal
      • Under Develop in left-side panel
      • Click + button and select or create a notebook
      • Under Manage on left-side panel, create and run an Apache Spark pool
      • Be sure to enable automatic pausing, Spark pools are expensive

Recommend and implement Azure Synapse Analytics database templates

  • Database Templates
    • Speed up design process
    • Create more thorough databases
  • Lake database in Synapse
    • Data lakes lack structure
    • Databases can be too structured
    • Lake database removes these downsides
    • Provides structured DB with meta info, stored in a data lake (parquet, delta, CSV formats)
    • Powered by serverless Synapse compute
  • In the portal
    • Architecture process
      • Access Synapse Studio instance
      • Create a Lake Database
      • Add a Table
      • Add Template
      • Select relevant features

Push new or updated data lineage to Microsoft Purview

  • Microsoft Purview
    • Unified data governance
    • On-prem, multi-cloud, SaaS
    • 4 pillars
      • Data quality
      • Data stewardship
      • Data protection and compliance
      • Data management
    • Data lifecycle management
      • Data catalog - organized inventory of data assets
      • Data estate insights - infrastructure helps organizations manage data
      • Data sharing - internally or across orgs
      • Data policy - provision access to data at scale
    • Primary use cases for Purview
      • Pull data from SQL DB and ADL and provide governance across the org
      • Financial services can show where critical data is stored to evaluate security risk
      • Large, diverse orgs can enable data democratization
  • Data lineage
    • Track data flow over time
    • Origination --> Delta (data changes) --> Sink (output)
    • Provides confidence in data
    • Facilitates governance and impact analysis
    • In the portal
      • Lineage tab shows a flow chart with sources, processes, and targets

Browse and search metadata in Microsoft Purview Data Catalog

  • In the portal
    • Open the Microsoft Purview Governance Portal
    • Data Catalog --> Browse --> By collection or source type
    • Go to Data map in left-side panel to register data sources
      • Data map --> Data sources --> Register
      • Need to do a new scan to establish lineage
      • Requires access control to be configured to allow Purview to scan the data sources

Qodo Takeover

Introducing Qodo Gen 1.0: Transform Your Workflow with Agentic AI

Rather than just generating snippets, our agents understand your entire project context, can make decisions, use tools, and carry out tasks autonomously.

Read full post

Top comments (0)

AWS GenAI LIVE image

How is generative AI increasing efficiency?

Join AWS GenAI LIVE! to find out how gen AI is reshaping productivity, streamlining processes, and driving innovation.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay