Connect to Azure Data Warehouse from Azure Databricks

#azure #databricks #keyvault #adw

Create a secret scope

A secret scope will allow you to use Azure KeyVault to download all secret information to connect to Azure Data Warehouse, e.g.: username/password, etc. In our example, this will be the whole connection string.

Navigate to https://{region}.azuredatabricks.net/?o={object_id}#secrets/createScope

Grab the DNS and Resource ID from your Azure KeyVault Properties

Get required secrets from azure keyvault and set Spark configuration

blob_storage_url = dbutils.secrets.get(scope = "databricks_scope", key = "blobstorageurl")

blob_storage_key = dbutils.secrets.get(scope = "databricks_scope", key = "blobstoragekey")

adw_connection = dbutils.secrets.get(scope = "databricks_scope", key = "dwconnection")

blob_storage_temp_dir = dbutils.secrets.get(scope = "databricks_scope", key = "blobstoragetempdir")

spark.conf.set(blob_storage_url, blob_storage_key)

Spark driver to SQL DW

The Spark driver connects to SQL DW via JDBC using a username and password. We recommended that you use the connection string provided by Azure portal, which enables Secure Sockets Layer (SSL) encryption for all data sent between the Spark driver and the SQL DW instance through the JDBC connection. To verify that the SSL encryption is enabled, you can search for encrypt=true in the connection string. To allow the Spark driver to reach SQL DW, we recommend that you set Allow access to Azure services to ON on the firewall pane of the SQL DW server through Azure portal. This setting allows communications from all Azure IP addresses and all Azure subnets, which allows Spark drivers to reach the SQL DW instance.

For more info see here

Load data from a SQL DW query into a Spark DataFrame.

df = (spark
      .read
      .format("com.databricks.spark.sqldw")
      .option("url", adw_connection)
      .option("tempDir", blob_storage_temp_dir)
      .option("forwardSparkAzureStorageCredentials", "true")
      .option("query", "select count(*) as counter from my_table")).load()

df.show()

That's it! Quick and easy.

DEV Community

Connect to Azure Data Warehouse from Azure Databricks

Create a secret scope

Get required secrets from azure keyvault and set Spark configuration

Spark driver to SQL DW

Load data from a SQL DW query into a Spark DataFrame.

Top comments (0)

Read next

Designing Scalable APIs

Study Shows AI Code Generators Only 60% Accurate, Half With Security Flaws

AI Breakthrough: 90% Faster 3D Object Detection Using Text-Guided Processing

New Framework Shows How to Find Hidden Weaknesses in AI Language Models