DBT and Databricks Part 1: Setting up DBT profile for connecting to Azure Databricks using…
This series of blog posts will illustrate how to use DBT with Azure Databricks: set up a connection profile, work with python models, and copy noSQL data into Databricks(from MongoDB). In the first part, we will talk about how to set up a profile when using dbt-databricks python package.
Install python package dbt-databricks using pip install
or add to requirements.txt
file. This package requires DBT profile (profiles.yml
) to be created with all the credentials as described here:
This profile can be created automatically using dbt init
command or manually. We use both: SQL (running on Databricks SQL Warehouse) and pyspark (running on Databricks cluster) DBT models. So, we will configure our profile against Databricks SQL endpoint and add cluster configuration to dbt_project.yml
so new clusters are created for each model(discussed in the next part).
We will be using environment variables to update credentials in profiles.yml
, so it will look like this:
Creating DBT profile:
- Create SQL Warehouse in Databricks and click Connection details
2. Copy the value of the Server hostname; it will be host
in DBT profile (or DBT_DB_HOST
env variable)
3. Copy the HTTP path; it will be http_path
in DBT profile (or DBT_DB_PATH
environment variable).
4. Set up database(schema) and catalog names, where DBT will save the results.
5. Token(DBT_DB_TOKEN
env variable):
dbt-databricks
is looking for a Personal Access Token(PAT) for this value. PAT tokens can be easily created for Databricks users, however, the best practice is to use Service Principals to run Databricks DBT models. Creating a Service Principal for DBT and PAT tokens is a bit trickier process. Let’s go through it step by step:
Create Service Principal for DBT:
Follow the process described here:
1. Go to Microsoft Entra ID(former Active Directory)-> App registrations and Click + New registration.
2. For Name, enter a name for the application.
3. In the Supported account types section, select Accounts in this organizational directory only (Single tenant).
4. Click Register.
5. Click on the newly created Application and choose Certificates and Secrets:
6. On the Client secrets tab, click New client secret.
7. Add Description and Expiry date and click Add.
8. Save secret value, click on Overview, and save the Application (client) ID and Directory (tenant) ID for this Application:
9. Go to Databricks, click on the menu in the top right corner → Admin settings → Service Principles (this process is different if identity federation is enabled; please check the reference).
10. Click Add Service principal → Add New.
11. Enter the name of the service principal and the application’s Application (client) ID saved in step 8.
Grant DBT Service Principal all needed permissions to manage Databricks resources (out of scope of this article)
Create Entra ID access token for DBT Service Principal:
Follow the process as described here:
- Run the following commands to get the Microsoft Entra ID access token (replace
with Application (client) ID saved in step 8, replace
with Directory (tenant) ID saved in step 8, replace
with the secret value saved in step 8) :
curl -X POST -H 'Content-Type: application/x-www-form-urlencoded' \
https://login.microsoftonline.com//oauth2/v2.0/token \
-d 'client_id=' \
-d 'grant_type=client_credentials' \
-d 'scope=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d%2F.default' \
-d 'client_secret='
2. The Microsoft Entra ID access token is in the access_token
value within the output of the call. This token expires after only 60 minutes. It can be used for DBT but needs to be refreshed every time before running DBT and will not allow any operation that lasts more than 60 minutes.
Create Databricks PAT token for DBT Service Principal
As mentioned at the bottom of this document, The Microsoft Entra ID access token can be used to call the Tokens API to create a Databricks PAT token for the Service Principal.
- First, allow Service Principals to create PAT tokens as described here:
2. In the Databricks right top menu, click on Admin Settings → Workspace Settings → Personal Access Tokens → Permission settings and add Service Principals:
3. Add Can Use permission for the Service Principal we are interested in.
4. Run the following command to generate the PAT token (
- replace
with the Entra ID access token we just generated, - replace
with the name of your Databricks deployment, for example, XXXXXX.azuredatabricks.net, - set a lifetime for a token in seconds, or if the lifetime is not specified, this token remains valid indefinitely):
curl --request POST "https:///api/2.0/token/create" \
--header "Authorization: Bearer " \
--data '{ "lifetime_seconds": X, "comment": "DBT Service Principal access token"}'
5. This command will generate a PAT Token for the service principal, and it can be saved as a token (DBT_DB_TOKEN
env variable in DBT profile).
The DBT profile is ready and can be used to run DBT on Azure Databricks!