Databricks

Databricks is a unified data intelligence platform.

Prerequisites

A JDK installation
The JDBC URL for the Databricks cluster

Setup

Environment Variables

Add the following to a .env file in your Cube project:

CUBEJS_DB_TYPE=databricks-jdbc
# CUBEJS_DB_NAME is optional
CUBEJS_DB_NAME=default
# You can find this inside the cluster's configuration
CUBEJS_DB_DATABRICKS_URL=jdbc:databricks://dbc-XXXXXXX-XXXX.cloud.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/XXXXX/XXXXX;AuthMech=3;UID=token
# You can specify the personal access token separately from [`CUBEJS_DB_DATABRICKS_URL`](/docs/configuration/reference/environment-variables#cubejs_db_databricks_url) by doing this:
CUBEJS_DB_DATABRICKS_TOKEN=XXXXX
# This accepts the Databricks usage policy and must be set to `true` to use the Databricks JDBC driver
CUBEJS_DB_DATABRICKS_ACCEPT_POLICY=true

Docker

Create a .env file as above, then extend the cubejs/cube:jdk Docker image tag to build a Cube image with the JDBC driver:

FROM cubejs/cube:jdk

COPY . .
RUN npm install

You can then build and run the image using the following commands:

docker build -t cube-jdk .
docker run -it -p 4000:4000 --env-file=.env cube-jdk

Environment Variables

Environment Variable	Description	Possible Values	Required
`CUBEJS_DB_NAME`	The name of the database to connect to	A valid database name	✅
`CUBEJS_DB_DATABRICKS_URL`	The URL for a JDBC connection	A valid JDBC URL	✅
`CUBEJS_DB_DATABRICKS_ACCEPT_POLICY`	Whether or not to accept the license terms for the Databricks JDBC driver	`true`, `false`	✅
`CUBEJS_DB_DATABRICKS_OAUTH_CLIENT_ID`	The OAuth client ID for service principal authentication	A valid client ID	❌
`CUBEJS_DB_DATABRICKS_OAUTH_CLIENT_SECRET`	The OAuth client secret for service principal authentication	A valid client secret	❌
`CUBEJS_DB_DATABRICKS_TOKEN`	The personal access token used to authenticate the Databricks connection	A valid token	❌
`CUBEJS_DB_DATABRICKS_CATALOG`	The name of the Databricks catalog to connect to	A valid catalog name	❌
`CUBEJS_DB_EXPORT_BUCKET_MOUNT_DIR`	The path for the Databricks DBFS mount (Not needed if using Unity Catalog connection)	A valid mount path	❌
`CUBEJS_DB_MAX_POOL`	The maximum number of concurrent database connections to pool. Default is `8`	A valid number	❌
`CUBEJS_CONCURRENCY`	The number of concurrent queries to the data source	A valid number	❌

Pre-Aggregation Feature Support

count_distinct_approx

Measures of type count_distinct_approx can be used in pre-aggregations when using Databricks as a source database. To learn more about Databricks’s support for approximate aggregate functions, click here.

Pre-Aggregation Build Strategies

To learn more about pre-aggregation build strategies, head here.

Feature	Works with read-only mode?	Is default?
Simple	✅	✅
Export Bucket	❌	❌

By default, Databricks JDBC uses a simple strategy to build pre-aggregations.

Simple

No extra configuration is required to configure simple pre-aggregation builds for Databricks.

Export Bucket

Databricks supports using both AWS S3 and Azure Blob Storage for export bucket functionality.

AWS S3

To use AWS S3 as an export bucket, first complete the Databricks guide on connecting to cloud object storage using Unity Catalog.

Ensure the AWS credentials are correctly configured in IAM to allow reads and writes to the export bucket in S3.

CUBEJS_DB_EXPORT_BUCKET_TYPE=s3
CUBEJS_DB_EXPORT_BUCKET=s3://my.bucket.on.s3
CUBEJS_DB_EXPORT_BUCKET_AWS_KEY=<AWS_KEY>
CUBEJS_DB_EXPORT_BUCKET_AWS_SECRET=<AWS_SECRET>
CUBEJS_DB_EXPORT_BUCKET_AWS_REGION=<AWS_REGION>

Google Cloud Storage

When using an export bucket, remember to assign the Storage Object Admin role to your Google Cloud credentials (CUBEJS_DB_EXPORT_GCS_CREDENTIALS).

To use Google Cloud Storage as an export bucket, first complete the Databricks guide on connecting to cloud object storage using Unity Catalog.

CUBEJS_DB_EXPORT_BUCKET=gs://databricks-export-bucket
CUBEJS_DB_EXPORT_BUCKET_TYPE=gcs
CUBEJS_DB_EXPORT_GCS_CREDENTIALS=<BASE64_ENCODED_SERVICE_CREDENTIALS_JSON>

Azure Blob Storage

To use Azure Blob Storage as an export bucket, follow the Databricks guide on connecting to Azure Data Lake Storage Gen2 and Blob Storage. Retrieve the storage account access key from your Azure account and use as follows:

CUBEJS_DB_EXPORT_BUCKET_TYPE=azure
CUBEJS_DB_EXPORT_BUCKET=wasbs://my-container@my-storage-account.blob.core.windows.net
CUBEJS_DB_EXPORT_BUCKET_AZURE_KEY=<AZURE_STORAGE_ACCOUNT_ACCESS_KEY>

Access key provides full access to the configuration and data, to use a fine-grained control over access to storage resources, follow the Databricks guide on authorize with Azure Active Directory. Create the service principal and replace the access key as follows:

CUBEJS_DB_EXPORT_BUCKET_AZURE_TENANT_ID=<AZURE_TENANT_ID>
CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_ID=<AZURE_CLIENT_ID>
CUBEJS_DB_EXPORT_BUCKET_AZURE_CLIENT_SECRET=<AZURE_CLIENT_SECRET>

SSL/TLS

Cube does not require any additional configuration to enable SSL/TLS for Databricks JDBC connections.

Additional Configuration

Cube Cloud

To accurately show partition sizes in the Cube Cloud APM, an export bucket must be configured.

Deployment

Configuration

Configuration

Deployment

Monitoring

AI

Account & Workspace

Databricks

Databricks

Prerequisites

Setup

Environment Variables

Docker

Environment Variables

Pre-Aggregation Feature Support

count_distinct_approx

Pre-Aggregation Build Strategies

Simple

Export Bucket

AWS S3

Google Cloud Storage

Azure Blob Storage

SSL/TLS

Additional Configuration

Cube Cloud

Deployment

Configuration

Configuration

Deployment

Monitoring

AI

Account & Workspace

​Databricks

​Prerequisites

​Setup

​Environment Variables

​Docker

​Environment Variables

​Pre-Aggregation Feature Support

​count_distinct_approx

​Pre-Aggregation Build Strategies

​Simple

​Export Bucket

​AWS S3

​Google Cloud Storage

​Azure Blob Storage

​SSL/TLS

​Additional Configuration

​Cube Cloud

Databricks

Prerequisites

Setup

Environment Variables

Docker

Environment Variables

Pre-Aggregation Feature Support

count_distinct_approx

Pre-Aggregation Build Strategies

Simple

Export Bucket

AWS S3

Google Cloud Storage

Azure Blob Storage

SSL/TLS

Additional Configuration

Cube Cloud