Access Blob Storage from Azure Databricks using Azure Key Vault

To access the blob storage from databricks, you need to add a secret scope. A secret scope is a collection of secrets identified by a name.

There can be 2 types of secret scopes:

  1. Azure Key Vault-backed
  2. Databricks Key Vault-backed – secret scope is stored in an encrypted database owned and managed by Azure Databricks.

Below is the step by step guide on how to access blob storage from databricks:

Assumptions: Azure databricks and Azure storage account is already provisioned.

  1. Make note of the Access Keys (key1) in the blob storage account.
  2. Create an Azure Key Vault
    • Name: databricks-isv-kv
    • Subscription: Choose a subscription
    • Resource group: Choose a resource group
  3. Once the resource group is created, go into the secrets menu and generate a new secret.

Name: DbksStorageKey

Value: Copied Key value from storage account access keys.

Create the secret

4. Go to properties menu and note down the DNS Name/URI and the Resource ID.

5. Go to the Azure databricks workspace and copy #secrets/createScope to the end of the URL.

6. You will be navigated to a window where you can create a secret scope and provide the following parameters:

Scope name: databricks-secret-scope

DNS Name: DNS name/URI from the key vault

Resource group: resource group from the key vault

7. Now the secret scope to access the blob storage is created. Next is to mount the blob storage within your code.

<conf-key> = fs.azure.account.key.<storage account name>.blob.core.windows.net

dbutils.fs.mount( source = “wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net”, mount_point = “/mnt/<mount-name>“, extra_configs = {“<conf-key>”:dbutils.secrets.get(scope = “<secret scope-name>“, key = “<key-name of key vault>“)})

Now please note that the above steps can be done only if you are accessing your Blob storage via a public end point. What will happen if it is not exposed to the public.

Generally, if you are processing a company’s internal data, catch is that your storage account will not be allowed to be accessed by the public. Most probably you or an admin will expose a private endpoint accessible only for specific users/roles. In this scenario, the steps 3 and 7 will be a little different. More over you will also need to create a Shared Access Signature.

Lets look at step 3.

Best way to go about this one is to create 3 different secrets; ClientID, ClientSecret and TenantID

Another step to add after this is to create a shared access signature IF your storage account do not let you access your container anonymously.

You can do this using your storage explorer, by right clicking on your storage account and clicking on ‘Get Shared Access Signature‘.

Make sure you have the below checked on and click on create:

Once this is created, you need to copy the query string some where.

Now the step 7.

tenandId = dbutils.secrets.get(scope=”databricks-secret-scope“,key=”TenantId”)

configs = {“fs.azure.account.auth.type”: “OAuth”,
“fs.azure.account.oauth.provider.type”: “org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider”,
“fs.azure.account.oauth2.client.id”: dbutils.secrets.get(scope = “databricks-secret-scope”, key = “ClientId”),
“fs.azure.account.oauth2.client.secret”: dbutils.secrets.get(scope = “databricks-secret-scope”, key = “ClientSecret”),
“fs.azure.account.oauth2.client.endpoint”: “https://login.microsoftonline.com/&#8221; + tenandId + “/oauth2/token”,
“fs.azure.sas.<container name>.<storage account name>.blob.core.windows.net”: “<Shared Access Key>“}

dbutils.fs.mount(
source = “wasbs://<container name>@<storage account name>.blob.core.windows.net”,
mount_point = “/mnt/<MountName>“,
extra_configs = configs)



One response to “Access Blob Storage from Azure Databricks using Azure Key Vault”

  1. […] Posted inUncategorized Access Blob Storage from Azure Databricks using Azure Key Vault […]

    Like

Leave a comment